Nice! Even an document written in german was interpreted.

#3
by weltenschmid - opened

Wow, this rocks. Even a german pdf about my company description was interpreted pretty well. Except to tell you with a smirk, that my nick weltenschmid was turned into "Welenschmollnal" - made me go 0_o first, but second thought was: this summarizer got humor. XD

Thought I ought to let you know.

Cheers, stay curious and have phunn! =)

Thank you! Super cool and thanks for pointing this out as well. I briefly tested this in passing almost a year ago now, but I haven’t had time to explore it.

In a quick test a while back, I noticed that that even the LED version trained on booksum was able to summarize German text in a passable manner. I’m curious if other languages work as well, and if there’s some sort of relationship of “summarization transfer to other languages” based on some sort of ‘linguistic similarity’ to English.

Also if I can ask, which model worked the best for this, or was it just with the default?

Absolutely! Didn't realized it in first place. Showed it to my colleague who was deemed the manager of our little crew. It was a tiny semantic ambiguity so the bot was somehow correct. I wrote to my colleague's profile that "he does this and that's and manages Social Media profiles and so on". I think I got the script & the resulting chat somewhere - if you like I'd pm it to you.
I bet that should work with Latin, Slavic or other language stems. A friggin' great feature. First thought was that it fetches data from different language models.

Hmm, not quite sure, but I didn't choose any book model. Not sure if I took the elife or just simplify one. Perhaps I made a screenshot... I'll scrape through my data mess later on. :)

BTW: The manager-colleague was quite astonished. Tried it himself and found out... the summarizer doesn't like invoices. :D

Cheers, whish you much creativity!

Thank you! very cool. Maybe I will try with some Polish later on. Btw, the demo does run everything through the clean-text python package before the summarization model sees it. I think it might not be the cause for this 'ability' (at least for German) but something to think about:

>>> from cleantext import clean
>>> de_text = "In Deutschland müssen Autofahrer bei Nässe auf der Autobahn besonders vorsichtig sein, da die Straßen rutschig sein können."
>>> cleaned_text = clean(de_text, lower=False)
>>> cleaned_text
'In Deutschland mussen Autofahrer bei Nasse auf der Autobahn besonders vorsichtig sein, da die Strassen rutschig sein konnen.'
>>> jp_text = "私は学生です。"
>>> cln_jp = clean(jp_text, lower=False)
>>> cln_jp
'Si haXue Sheng desu.'
>>>

also - yes would be glad to see it! any data/feedback is good :) you can also download the outputs as .txt if you do try again sometime, and that also records the params. Feel free to post here or email me if that's easier - you can find my website (that has my email) on my profile

Sign up or log in to comment