Spaces:

mikeee
/

radiobee-aligner

Build error

App Files Files Community

freemt commited on May 17, 2022

Commit

1ca37ad

•

1 Parent(s): 265100f

Update docs

Browse files

Files changed (17) hide show

data/en.txt +2 -2
data/zh.txt +1 -1
docs/build/doctrees/environment.pickle +0 -0
docs/build/doctrees/examples.doctree +0 -0
docs/build/doctrees/intro.doctree +0 -0
docs/build/html/_sources/examples.rst.txt +2 -0
docs/build/html/_sources/intro.rst.txt +1 -1
docs/build/html/examples.html +1 -0
docs/build/html/intro.html +1 -1
docs/build/html/searchindex.js +1 -1
docs/source/examples.rst +2 -0
docs/source/intro.rst +1 -1
radiobee/__init__.py +1 -0
radiobee/detect.py +1 -1
radiobee/radiobee_cli.py +545 -0
radiobee/trim_df.py +2 -6
requirements.txt +4 -1

data/en.txt CHANGED Viewed

@@ -1,5 +1,5 @@
-[Young Warrior] Kingold(184283681) 2021-12-30 22:27:37
-It seems that the standalone version can
 omit the GUI and specify the two files to be aligned directly on the command line.

+[Young Warrior] Kingold(...) 2021-12-30 22:27:37
+It seems that the standalone version can
 omit the GUI and specify the two files to be aligned directly on the command line.

data/zh.txt CHANGED Viewed

@@ -1,4 +1,4 @@
-【少侠】Kingold(184283681) 2021-12-30 22:27:37
 单机版貌似可以省略掉图形界面，直接
 命令行指定两个待对齐文件。

+【少侠】Kingold(...) 2021-12-30 22:27:37
 单机版貌似可以省略掉图形界面，直接
 命令行指定两个待对齐文件。

docs/build/doctrees/environment.pickle CHANGED Viewed

Binary files a/docs/build/doctrees/environment.pickle and b/docs/build/doctrees/environment.pickle differ

docs/build/doctrees/examples.doctree CHANGED Viewed

Binary files a/docs/build/doctrees/examples.doctree and b/docs/build/doctrees/examples.doctree differ

docs/build/doctrees/intro.doctree CHANGED Viewed

Binary files a/docs/build/doctrees/intro.doctree and b/docs/build/doctrees/intro.doctree differ

docs/build/html/_sources/examples.rst.txt CHANGED Viewed

@@ -3,6 +3,8 @@ Examples
 ``radiobee`` has in-built examples. Just click one of the rows in the ``Examples`` table and click ``Submit`` to testrun.
 Installation/Usage:
 *******************
 As the package has not been published on PyPi yet, it CANNOT be installed using pip.

 ``radiobee`` has in-built examples. Just click one of the rows in the ``Examples`` table and click ``Submit`` to testrun.
+`gradio 3` (run in hf spaces) seems to have trouble with examples. Hence, examples may be taken off line until the problem is fixed.
 Installation/Usage:
 *******************
 As the package has not been published on PyPi yet, it CANNOT be installed using pip.

docs/build/html/_sources/intro.rst.txt CHANGED Viewed

@@ -18,4 +18,4 @@ Limitations
 Currently, only zh-en/en-zh pairs are supported in fast-track mode although further pairs will be added if and when time permits.
 If you are willing to help with a particular pair (for example, de-zh, ja-zh, ru-zh, etc.), you are welcome to contact the developer.
-An experimental slow-track mode (time required approximately 10 times that of fast-track mode) is introdueced for other laugnage pairs.

 Currently, only zh-en/en-zh pairs are supported in fast-track mode although further pairs will be added if and when time permits.
 If you are willing to help with a particular pair (for example, de-zh, ja-zh, ru-zh, etc.), you are welcome to contact the developer.
+An experimental slow-track mode (time required approximately 10 times that of fast-track mode) is introduced for other laugnage pairs.

docs/build/html/examples.html CHANGED Viewed

@@ -76,6 +76,7 @@
   <section id="examples">
 <h1>Examples<a class="headerlink" href="#examples" title="Permalink to this headline"></a></h1>
 <p><code class="docutils literal notranslate"><span class="pre">radiobee</span></code> has in-built examples. Just click one of the rows in the <code class="docutils literal notranslate"><span class="pre">Examples</span></code> table and click <code class="docutils literal notranslate"><span class="pre">Submit</span></code> to testrun.</p>
 <section id="installation-usage">
 <h2>Installation/Usage:<a class="headerlink" href="#installation-usage" title="Permalink to this headline"></a></h2>
 <p>As the package has not been published on PyPi yet, it CANNOT be installed using pip.</p>

   <section id="examples">
 <h1>Examples<a class="headerlink" href="#examples" title="Permalink to this headline"></a></h1>
 <p><code class="docutils literal notranslate"><span class="pre">radiobee</span></code> has in-built examples. Just click one of the rows in the <code class="docutils literal notranslate"><span class="pre">Examples</span></code> table and click <code class="docutils literal notranslate"><span class="pre">Submit</span></code> to testrun.</p>
+<p><cite>gradio 3</cite> (run in hf spaces) seems to have trouble with examples. Hence, examples may be taken off line until the problem is fixed.</p>
 <section id="installation-usage">
 <h2>Installation/Usage:<a class="headerlink" href="#installation-usage" title="Permalink to this headline"></a></h2>
 <p>As the package has not been published on PyPi yet, it CANNOT be installed using pip.</p>

docs/build/html/intro.html CHANGED Viewed

@@ -87,7 +87,7 @@
 <h2>Limitations<a class="headerlink" href="#limitations" title="Permalink to this headline"></a></h2>
 <p>Currently, only zh-en/en-zh pairs are supported in fast-track mode although further pairs will be added if and when time permits.
 If you are willing to help with a particular pair (for example, de-zh, ja-zh, ru-zh, etc.), you are welcome to contact the developer.</p>
-<p>An experimental slow-track mode (time required approximately 10 times that of fast-track mode) is introdueced for other laugnage pairs.</p>
 </section>
 </section>

 <h2>Limitations<a class="headerlink" href="#limitations" title="Permalink to this headline"></a></h2>
 <p>Currently, only zh-en/en-zh pairs are supported in fast-track mode although further pairs will be added if and when time permits.
 If you are willing to help with a particular pair (for example, de-zh, ja-zh, ru-zh, etc.), you are welcome to contact the developer.</p>
+<p>An experimental slow-track mode (time required approximately 10 times that of fast-track mode) is introduced for other laugnage pairs.</p>
 </section>
 </section>

docs/build/html/searchindex.js CHANGED Viewed

@@ -1 +1 @@

- Search.setIndex({docnames:["examples","index","intro","modules","radiobee","userguide","userguide-zh"],envversion:{"sphinx.domains.c":2,"sphinx.domains.changeset":1,"sphinx.domains.citation":1,"sphinx.domains.cpp":4,"sphinx.domains.index":1,"sphinx.domains.javascript":2,"sphinx.domains.math":2,"sphinx.domains.python":3,"sphinx.domains.rst":2,"sphinx.domains.std":2,sphinx:56},filenames:["examples.rst","index.rst","intro.rst","modules.rst","radiobee.rst","userguide.rst","userguide-zh.rst"],objects:{},objnames:{},objtypes:{},terms:{"1":[5,6],"10":2,"12":[5,6],"2":[5,6],"200":[5,6],"2000":[5,6],"3":2,"316287378":[5,6],"4":[5,6],"~~5":[],"~~500":6,"8":[5,6],"\u4e00\u822c\u65e0\u9700\u7406\u4f1a\u8fd9\u4e9b\u53c2\u6570":6,"\u4e2d\u82f1\u975e\u7a7a\u884c\u9650\u5236\u5728":6,"\u4e3a\u4e2d\u82f1\u6587\u6df7\u5408\u6587\u672c\u53ca\u8bd5\u7740\u5206\u79bb\u4e2d\u82f1\u6587":6,"\u4e3a\u7a7a\u767d\u65f6":6,"\u4e86\u89e3\u8fd9\u4e9b\u5bf9\u9f50\u5de5\u5177":6,"\u4ee5\u5185":6,"\u4ee5\u540e\u53ef\u80fd\u4f1a\u652f\u6301":6,"\u4f18\u8d28\u5bf9":6,"\u4f7f\u7528\u8bf4\u660e":1,"\u5176\u4ed6\u8bed\u8a00\u5bf9\u7684\u5bf9\u9f50":6,"\u5219\u4f1a\u89c6":6,"\u5219\u9650\u5236\u5728":6,"\u53e6\u4e00\u65b9\u9762":6,"\u53ef\u4ee5\u53f3\u51fb\u62f7\u51fa\u56fe\u7684\u94fe\u63a5\u7528\u6d4f\u89c8\u5668\u72ec\u7acb\u8bbf\u95ee\u62f7\u51fa\u6765\u7684\u94fe\u63a5\u6216\u53f3\u51fb\u5b58\u76d8\u518d\u7528\u770b\u56fe\u7a0b\u5e8f\u6253\u5f00\u5b58\u76d8\u7684\u56fe\u6587\u4ef6":6,"\u548c":6,"\u5acc\u56fe\u592a\u5c0f\u7684\u8bdd":6,"\u5b58\u4e0b\u6709\u5173\u53c2\u6570\u67e5\u770b\u6216\u901a\u77e5\u5f00\u53d1\u8005":6,"\u5bf9\u7ea6\u97005\u5206\u949f":6,"\u5feb\u5bf9\u6a21\u5f0f\u76ee\u524d\u4ec5\u652f\u6301\u4e2d\u82f1":6,"\u662f":6,"\u6700\u5c0f":6,"\u7136\u540e\u8fdb\u884c\u5bf9\u9f50":6,"\u7684\u5b6a\u751f\u5144\u5f1f":6,"\u7684\u5efa\u8bae\u503c":6,"\u76ee\u524d\u4ec5\u652f\u6301\~~u4e2d\u82f1":[],"\u76ee\u524d\u4ec5\u652f\u6301\~~u7eaf\u6587\u672c\u6587\u4ef6\u4e0a\u8f7d":6,"\u7b2c\u4e8c\u6b21\u4e0a\u8f7d\u6587\u4ef6\u524d\u8bf7\u70b9\u51fb":6,"\u7b49":6,"\u7b49\u683c\u5f0f":6,"\u82f1\u4e2d":6,"\u82f1\u4e2d\u5bf9\u9f50":6,"\u8bbe\u5927\u4e9b\u5219\u4f1a\u5f97\u5230\u5c11\u4e00\u4e9b\u5bf9\u9f50\u5bf9\u56e0\u4e3a\u53ef\u80fd\u9519\u5931\u4e86\u4e00\u4e9b":6,"\u8bbe\u5927\u4e9b\u6216":6,"\u8bbe\u5c0f\u4e9b\u53ef\u4ee5\u5f97\u5230\u66f4\u591a\u7684\u5bf9\u9f50\u5bf9\u4f46\u4e5f\u4f1a\u6709\u66f4\u591a":6,"\u8bbe\u5c0f\u4e9b\u6216":6,"\u8bef\u62a5\u5bf9":6,"\u8bf7\u52a0\u5165qq\u7fa4":6,"\u8fd0\u884c\u51fa\u9519\u65f6\u53ef\u4ee5\u70b9\u51fb":6,"\u9519\u8bef\u5224\u65ad\u4e3a\u5bf9\u9f50\u7684\u5bf9":6,"do":5,"new":5,As:0,For:0,If:[2,5],On:5,The:[2,5],To:5,about:5,ad:2,address:5,aim:2,align:[0,2,5,6],align_s:[1,3],align_text:[1,3],also:5,although:2,amend_avec:[1,3],an:2,app:[1,3],applic:2,approxim:2,ar:[2,5],attempt:5,been:[0,2],befor:5,better:5,blank:5,browser:5,built:0,bumblebe:[5,6],can:5,candid:5,cannot:0,cat:2,chines:5,clear:[5,6],click:[0,5],cmat2tset:[1,3],co:0,contact:2,content:3,copi:5,csv:[5,6],current:2,de:2,develop:[2,5],dl_type:[5,6],docterm_scor:[1,3],docx:[5,6],download:0,dual:2,dualtext:2,e:2,ebook:2,educ:2,en2zh:[1,3],en2zh_token:[1,3],en:[2,5],english:5,epsilon:[5,6],esp:[5,6],etc:[2,5],exampl:[1,2,5],experiment:2,fals:5,fast:2,file2text:[1,3],file:[5,6],files2df:[1,3],find:2,first:5,flag:[5,6],format:5,full:2,further:2,g:2,gen_aset:[1,3],gen_eps_minsampl:[1,3],gen_model:[1,3],gen_pset:[1,3],gen_row_align:[1,3],go:5,good:5,gradio:2,group:5,ha:[0,2],hand:5,have:5,help:2,~~here~~:[],how:1,html:[5,6],http:0,huggingfac:0,identifi:5,idf_typ:[5,6],imag:5,implement:2,index:1,inform:5,insert_spac:[1,3],instal:1,interfac:2,interpolate_pset:[1,3],introduct:1,~~introduec:2,~~ja:2,join:5,just:0,know:5,languag:2,languang:5,larger:5,later:5,laugnag:2,learn:2,left:5,limit:[1,5],line:5,lists2cmat:[1,3],loadtext:[1,3],look:5,machin:2,mai:5,mani:2,md:[5,6],mdx_e2c:[1,3],method:0,mikee:0,min_sampl:[5,6],minimum:5,~~minut:[],~~miss:5,mix:5,mode:2,modul:[1,3],more:5,motiv:1,need:5,non:5,norm:[5,6],normal:5,now:0,number:5,one:0,onli:2,onlin:0,open:5,other:[2,5],output:5,packag:[0,1,3],page:1,pair:[2,5],paragraph:2,particular:2,pdf:[5,6],~~per:[],~~permit:2,pip:0,pleas:5,plot_cmat:[1,3],plot_df:[1,3],posit:5,power:2,proced:5,process_upload:[1,3],properli:2,provid:2,publish:0,pure:5,pypi:0,python:2,qq:5,radiobe:[0,2,5,6],requir:2,result:5,right:5,row:0,ru:2,save:5,search:1,seg_text:[1,3],select:5,sentenc:2,separ:5,should:5,shuffle_s:[1,3],sibl:5,slow:2,smaller:5,smatrix:[1,3],someth:5,space:0,srt:[5,6],submit:[0,5],submodul:[1,3],subsequ:5,suggest:[0,5],support:[2,5],tab:5,tabl:0,tend:5,term:2,testrun:0,text:[2,5],tf_type:[5,6],them:5,time:2,tmx:2,touch:5,track:2,translat:2,treat:5,trim_df:[1,3],two:2,txt:[5,6],unless:5,upload:5,us:[0,1],usag:1,valu:5,version:0,~~wa:[],~~welcom:2,what:5,when:[2,5],willing:2,wrong:5,yet:0,you:[2,5],zh:[2,5],zip:0},titles:["Examples","Welcome to radiobee\u2019s documentation!","Introduction","radiobee","radiobee package","How to use","\u4f7f\u7528\u8bf4\u660e"],titleterms:{"\u4f7f\u7528\u8bf4\u660e":6,align_s:4,align_text:4,amend_avec:4,app:4,cmat2tset:4,content:[1,4],docterm_scor:4,document:1,en2zh:4,en2zh_token:4,exampl:0,file2text:4,files2df:4,gen_aset:4,gen_eps_minsampl:4,gen_model:4,gen_pset:4,gen_row_align:4,how:5,indic:1,insert_spac:4,instal:0,interpolate_pset:4,introduct:2,limit:2,lists2cmat:4,loadtext:4,mdx_e2c:4,modul:4,motiv:2,packag:4,plot_cmat:4,plot_df:4,process_upload:4,radiobe:[1,3,4],s:1,seg_text:4,shuffle_s:4,smatrix:4,submodul:4,tabl:1,trim_df:4,us:5,usag:0,welcom:1}})

+ Search.setIndex({docnames:["examples","index","intro","modules","radiobee","userguide","userguide-zh"],envversion:{"sphinx.domains.c":2,"sphinx.domains.changeset":1,"sphinx.domains.citation":1,"sphinx.domains.cpp":4,"sphinx.domains.index":1,"sphinx.domains.javascript":2,"sphinx.domains.math":2,"sphinx.domains.python":3,"sphinx.domains.rst":2,"sphinx.domains.std":2,sphinx:56},filenames:["examples.rst","index.rst","intro.rst","modules.rst","radiobee.rst","userguide.rst","userguide-zh.rst"],objects:{},objnames:{},objtypes:{},terms:{"1":[5,6],"10":2,"12":[5,6],"2":[5,6],"200":[5,6],"2000":[5,6],"3":[0,2],"316287378":[5,6],"4":[5,6],"500":6,"8":[5,6],"\u4e00\u822c\u65e0\u9700\u7406\u4f1a\u8fd9\u4e9b\u53c2\u6570":6,"\u4e2d\u82f1\u975e\u7a7a\u884c\u9650\u5236\u5728":6,"\u4e3a\u4e2d\u82f1\u6587\u6df7\u5408\u6587\u672c\u53ca\u8bd5\u7740\u5206\u79bb\u4e2d\u82f1\u6587":6,"\u4e3a\u7a7a\u767d\u65f6":6,"\u4e86\u89e3\u8fd9\u4e9b\u5bf9\u9f50\u5de5\u5177":6,"\u4ee5\u5185":6,"\u4ee5\u540e\u53ef\u80fd\u4f1a\u652f\u6301":6,"\u4f18\u8d28\u5bf9":6,"\u4f7f\u7528\u8bf4\u660e":1,"\u5176\u4ed6\u8bed\u8a00\u5bf9\u7684\u5bf9\u9f50":6,"\u5219\u4f1a\u89c6":6,"\u5219\u9650\u5236\u5728":6,"\u53e6\u4e00\u65b9\u9762":6,"\u53ef\u4ee5\u53f3\u51fb\u62f7\u51fa\u56fe\u7684\u94fe\u63a5\u7528\u6d4f\u89c8\u5668\u72ec\u7acb\u8bbf\u95ee\u62f7\u51fa\u6765\u7684\u94fe\u63a5\u6216\u53f3\u51fb\u5b58\u76d8\u518d\u7528\u770b\u56fe\u7a0b\u5e8f\u6253\u5f00\u5b58\u76d8\u7684\u56fe\u6587\u4ef6":6,"\u548c":6,"\u5acc\u56fe\u592a\u5c0f\u7684\u8bdd":6,"\u5b58\u4e0b\u6709\u5173\u53c2\u6570\u67e5\u770b\u6216\u901a\u77e5\u5f00\u53d1\u8005":6,"\u5bf9\u7ea6\u97005\u5206\u949f":6,"\u5feb\u5bf9\u6a21\u5f0f\u76ee\u524d\u4ec5\u652f\u6301\u4e2d\u82f1":6,"\u662f":6,"\u6700\u5c0f":6,"\u7136\u540e\u8fdb\u884c\u5bf9\u9f50":6,"\u7684\u5b6a\u751f\u5144\u5f1f":6,"\u7684\u5efa\u8bae\u503c":6,"\u76ee\u524d\u4ec5\u652f\u6301\u7eaf\u6587\u672c\u6587\u4ef6\u4e0a\u8f7d":6,"\u7b2c\u4e8c\u6b21\u4e0a\u8f7d\u6587\u4ef6\u524d\u8bf7\u70b9\u51fb":6,"\u7b49":6,"\u7b49\u683c\u5f0f":6,"\u82f1\u4e2d":6,"\u82f1\u4e2d\u5bf9\u9f50":6,"\u8bbe\u5927\u4e9b\u5219\u4f1a\u5f97\u5230\u5c11\u4e00\u4e9b\u5bf9\u9f50\u5bf9\u56e0\u4e3a\u53ef\u80fd\u9519\u5931\u4e86\u4e00\u4e9b":6,"\u8bbe\u5927\u4e9b\u6216":6,"\u8bbe\u5c0f\u4e9b\u53ef\u4ee5\u5f97\u5230\u66f4\u591a\u7684\u5bf9\u9f50\u5bf9\u4f46\u4e5f\u4f1a\u6709\u66f4\u591a":6,"\u8bbe\u5c0f\u4e9b\u6216":6,"\u8bef\u62a5\u5bf9":6,"\u8bf7\u52a0\u5165qq\u7fa4":6,"\u8fd0\u884c\u51fa\u9519\u65f6\u53ef\u4ee5\u70b9\u51fb":6,"\u9519\u8bef\u5224\u65ad\u4e3a\u5bf9\u9f50\u7684\u5bf9":6,"do":5,"new":5,As:0,For:0,If:[2,5],On:5,The:[2,5],To:5,about:5,ad:2,address:5,aim:2,align:[0,2,5,6],align_s:[1,3],align_text:[1,3],also:5,although:2,amend_avec:[1,3],an:2,app:[1,3],applic:2,approxim:2,ar:[2,5],attempt:5,been:[0,2],befor:5,better:5,blank:5,browser:5,built:0,bumblebe:[5,6],can:5,candid:5,cannot:0,cat:2,chines:5,clear:[5,6],click:[0,5],cmat2tset:[1,3],co:0,contact:2,content:3,copi:5,csv:[5,6],current:2,de:2,develop:[2,5],dl_type:[5,6],docterm_scor:[1,3],docx:[5,6],download:0,dual:2,dualtext:2,e:2,ebook:2,educ:2,en2zh:[1,3],en2zh_token:[1,3],en:[2,5],english:5,epsilon:[5,6],esp:[5,6],etc:[2,5],exampl:[1,2,5],experiment:2,fals:5,fast:2,file2text:[1,3],file:[5,6],files2df:[1,3],find:2,first:5,fix:0,flag:[5,6],format:5,full:2,further:2,g:2,gen_aset:[1,3],gen_eps_minsampl:[1,3],gen_model:[1,3],gen_pset:[1,3],gen_row_align:[1,3],go:5,good:5,gradio:[0,2],group:5,ha:[0,2],hand:5,have:[0,5],help:2,henc:0,hf:0,how:1,html:[5,6],http:0,huggingfac:0,identifi:5,idf_typ:[5,6],imag:5,implement:2,index:1,inform:5,insert_spac:[1,3],instal:1,interfac:2,interpolate_pset:[1,3],introduc:2,introduct:1,ja:2,join:5,just:0,know:5,languag:2,languang:5,larger:5,later:5,laugnag:2,learn:2,left:5,limit:[1,5],line:[0,5],lists2cmat:[1,3],loadtext:[1,3],look:5,machin:2,mai:[0,5],mani:2,md:[5,6],mdx_e2c:[1,3],method:0,mikee:0,min_sampl:[5,6],minimum:5,miss:5,mix:5,mode:2,modul:[1,3],more:5,motiv:1,need:5,non:5,norm:[5,6],normal:5,now:0,number:5,off:0,one:0,onli:2,onlin:0,open:5,other:[2,5],output:5,packag:[0,1,3],page:1,pair:[2,5],paragraph:2,particular:2,pdf:[5,6],permit:2,pip:0,pleas:5,plot_cmat:[1,3],plot_df:[1,3],posit:5,power:2,problem:0,proced:5,process_upload:[1,3],properli:2,provid:2,publish:0,pure:5,pypi:0,python:2,qq:5,radiobe:[0,2,5,6],requir:2,result:5,right:5,row:0,ru:2,run:0,save:5,search:1,seem:0,seg_text:[1,3],select:5,sentenc:2,separ:5,should:5,shuffle_s:[1,3],sibl:5,slow:2,smaller:5,smatrix:[1,3],someth:5,space:0,srt:[5,6],submit:[0,5],submodul:[1,3],subsequ:5,suggest:[0,5],support:[2,5],tab:5,tabl:0,taken:0,tend:5,term:2,testrun:0,text:[2,5],tf_type:[5,6],them:5,time:2,tmx:2,touch:5,track:2,translat:2,treat:5,trim_df:[1,3],troubl:0,two:2,txt:[5,6],unless:5,until:0,upload:5,us:[0,1],usag:1,valu:5,version:0,welcom:2,what:5,when:[2,5],willing:2,wrong:5,yet:0,you:[2,5],zh:[2,5],zip:0},titles:["Examples","Welcome to radiobee\u2019s documentation!","Introduction","radiobee","radiobee package","How to use","\u4f7f\u7528\u8bf4\u660e"],titleterms:{"\u4f7f\u7528\u8bf4\u660e":6,align_s:4,align_text:4,amend_avec:4,app:4,cmat2tset:4,content:[1,4],docterm_scor:4,document:1,en2zh:4,en2zh_token:4,exampl:0,file2text:4,files2df:4,gen_aset:4,gen_eps_minsampl:4,gen_model:4,gen_pset:4,gen_row_align:4,how:5,indic:1,insert_spac:4,instal:0,interpolate_pset:4,introduct:2,limit:2,lists2cmat:4,loadtext:4,mdx_e2c:4,modul:4,motiv:2,packag:4,plot_cmat:4,plot_df:4,process_upload:4,radiobe:[1,3,4],s:1,seg_text:4,shuffle_s:4,smatrix:4,submodul:4,tabl:1,trim_df:4,us:5,usag:0,welcom:1}})

docs/source/examples.rst CHANGED Viewed

@@ -3,6 +3,8 @@ Examples
 ``radiobee`` has in-built examples. Just click one of the rows in the ``Examples`` table and click ``Submit`` to testrun.
 Installation/Usage:
 *******************
 As the package has not been published on PyPi yet, it CANNOT be installed using pip.

 ``radiobee`` has in-built examples. Just click one of the rows in the ``Examples`` table and click ``Submit`` to testrun.
+`gradio 3` (run in hf spaces) seems to have trouble with examples. Hence, examples may be taken off line until the problem is fixed.
 Installation/Usage:
 *******************
 As the package has not been published on PyPi yet, it CANNOT be installed using pip.

docs/source/intro.rst CHANGED Viewed

@@ -18,4 +18,4 @@ Limitations
 Currently, only zh-en/en-zh pairs are supported in fast-track mode although further pairs will be added if and when time permits.
 If you are willing to help with a particular pair (for example, de-zh, ja-zh, ru-zh, etc.), you are welcome to contact the developer.
-An experimental slow-track mode (time required approximately 10 times that of fast-track mode) is introdueced for other laugnage pairs.

 Currently, only zh-en/en-zh pairs are supported in fast-track mode although further pairs will be added if and when time permits.
 If you are willing to help with a particular pair (for example, de-zh, ja-zh, ru-zh, etc.), you are welcome to contact the developer.
+An experimental slow-track mode (time required approximately 10 times that of fast-track mode) is introduced for other laugnage pairs.

radiobee/__init__.py CHANGED Viewed

	@@ -1 +1,2 @@
1	"""Init."""


1	"""Init."""
2	+ __version__ = "0.1.0b"

radiobee/detect.py CHANGED Viewed

@@ -29,7 +29,7 @@ def with_func_attrs(**attrs: Any) -> Callable:
 def detect(text: str, set_languages: Optional[List[str]] = None) -> str:
     """Detect language via polyglot and fastlid.
-    check first with fastlid, if conf < 0.3, check with
     Alternative in detec_alt.py
     """

 def detect(text: str, set_languages: Optional[List[str]] = None) -> str:
     """Detect language via polyglot and fastlid.
+    check first with fastlid, if conf < 0.3, check with polyglot.text.Detector
     Alternative in detec_alt.py
     """

radiobee/radiobee_cli.py ADDED Viewed

	@@ -0,0 +1,545 @@

+"""Run radiobee-cli, based on gradiobee.
+https://stackoverflow.com/questions/71007924/how-can-i-get-a-version-to-the-root-of-a-typer-typer-application
+"""
+# pylint: disable=invalid-name, too-many-arguments, too-many-branches, too-many-locals, too-many-statements, unused-variable, too-many-return-statements, unused-import
+from typing import Optional
+from pathlib import Path
+import platform
+import inspect
+from itertools import zip_longest
+# import tempfile
+# from click import click
+import typer
+from sklearn.cluster import DBSCAN
+from fastlid import fastlid
+from logzero import logger
+from icecream import ic
+import numpy as np  # noqa
+import pandas as pd
+import matplotlib  # noqa
+import matplotlib.pyplot as plt
+import seaborn as sns
+import sys
+if "." not in sys.path:
+    sys.path.append(".")
+# from radiobee.process_upload import process_upload
+from radiobee.files2df import files2df
+from radiobee.file2text import file2text
+from radiobee.lists2cmat import lists2cmat
+from radiobee.gen_pset import gen_pset
+from radiobee.gen_aset import gen_aset
+from radiobee.align_texts import align_texts
+from radiobee.cmat2tset import cmat2tset
+from radiobee.trim_df import trim_df
+from radiobee.error_msg import error_msg
+from radiobee.text2lists import text2lists
+from radiobee.align_sents import align_sents
+from radiobee.shuffle_sents import shuffle_sents  # type: ignore
+from radiobee.paras2sents import paras2sents  # type: ignore
+from radiobee import __version__
+sns.set()
+sns.set_style("darkgrid")
+pd.options.display.float_format = "{:,.2f}".format
+debug = False
+debug = True
+_ = """
+def gradiobee(  # noqa
+    file1,
+    file2,
+    tf_type,
+    idf_type,
+    dl_type,
+    norm,
+    eps,
+    min_samples,
+    # debug=False,
+    sent_ali_algo,
+):
+# """
+app = typer.Typer(
+    add_completion=False,
+)
+def version_callback(value: bool):
+    if value:
+        ver = typer.style(f"{__version__}", fg=typer.colors.GREEN, bold=True)
+        typer.echo(f"radiobee-cli {ver}")
+        raise typer.Exit()
+@app.command()
+def radiobee_cli(
+    file1: str = typer.Argument(..., help="first file name"),
+    file2: str = typer.Argument(None, help="optinal second file name (if not provided, the first file will be separated to two files)"),
+    tf_type: str = typer.Option("linear", help="tf type [linear, sqrt, log, binary]"),
+    idf_type: str = typer.Option(None, help="idf type [None, standard, smooth, bm25]"),
+    dl_type: str = typer.Option("", help="dl type [None, linear, sqrt, log]"),
+    norm: str = typer.Option("", help="norm [None, l1, l2]"),
+    eps: float = typer.Option(10, help="epsilon, typicaly between 1 and 20"),
+    min_samples: int = typer.Option(6, help="minimum samples, typicaly between 1 and 20"),
+    sent_ali_algo: str = typer.Option("", help="sentence align algorithm [None, fast, slow]"),
+    version: Optional[bool] = typer.Option(
+        None, "--version", "-V", callback=version_callback, is_eager=True,
+    ),
+):
+    """Align dualtext."""
+    logger.debug(" *debug* ")
+    # possible further switchse
+    # para_sent: para/sent
+    # sent_ali: default/radio/gale-church
+    plot_dia = True  # noqa
+    # outputs: check return
+    # if outputs is modified, also need to modify error_msg's outputs
+    # convert "None" to None for those Radio types
+    for _ in [idf_type, dl_type, norm]:
+        if _ in "None":
+            _ = None
+    # logger.info("file1: *%s*, file2: *%s*", file1, file2)
+    if file2 is not None:
+        logger.info("file1.name: *%s*, file2.name: *%s*", file1.name, file2.name)
+    else:
+        logger.info("file1.name: *%s*, file2: *%s*", file1.name, file2)
+    # bypass if file1 or file2 is str input
+    # if not (isinstance(file1, str) or isinstance(file2, str)):
+    text1 = file2text(file1)
+    if file2 is None:
+        logger.debug("file2 is None")
+        text2 = ""
+    else:
+        logger.debug("file2.name: %s", file2.name)
+        text2 = file2text(file2)
+    # if not text1.strip() or not text2.strip():
+    if not text1.strip():
+        msg = (
+            "file 1 is apparently empty... Upload a none empty file and try again."
+            # f"text1[:10]: [{text1[:10]}], "
+            # f"text2[:10]: [{text2[:10]}]"
+        )
+        return error_msg(msg)
+    # single file
+    # when text2 is empty
+    # process file1/text1: split text1 to text1 text2 to zh-en
+    len_max = 2000
+    if not text2.strip():  # empty file2
+        _ = [elm.strip() for elm in text1.splitlines() if elm.strip()]
+        if not _:  # essentially empty file1
+            return error_msg("Nothing worthy of processing in file 1")
+        logger.info(
+            "single file: len %s, max %s",
+            len(_), 2 * len_max
+        )
+        # exit if there are too many lines
+        if len(_) > 2 * len_max:
+            return error_msg(f" Too many lines ({len(_)}) > {2 * len_max}, alignment op halted, sorry.", "info")
+        _ = zip_longest(_, [""])
+        _ = pd.DataFrame(_, columns=["text1", "text2"])
+        df_trimmed = trim_df(_)
+        # text1 = loadtext("data/test-dual.txt")
+        list1, list2 = text2lists(text1)
+        lang1 = text2lists.lang1
+        lang2 = text2lists.lang2
+        offset = text2lists.offset  # noqa
+        _ = """
+        ax = sns.heatmap(lists2cmat(list1, list2), cmap="gist_earth_r")  # ax=plt.gca()
+        ax.invert_yaxis()
+        ax.set(
+            xlabel=lang1,
+            ylabel=lang2,
+            title=f"cos similary heatmap \n(offset={offset})",
+        )
+        plt_loc = "img/plt.png"
+        plt.savefig(plt_loc)
+        # """
+        # output_plot = plt_loc  # for gr.outputs.Image
+        #
+        _ = zip_longest(list1, list2, fillvalue="")
+        df_aligned = pd.DataFrame(
+            _,
+            columns=["text1", "tex2"]
+        )
+        file_dl = Path(f"{Path(file1.name).stem[:-8]}-{lang1}-{lang2}.csv")
+        file_dl_xlsx = Path(
+            f"{Path(file1.name).stem[:-8]}-{lang1}-{lang2}.xlsx"
+        )
+        # return  df_trimmed, output_plot, file_dl, file_dl_xlsx, df_aligned
+    # end if single file
+    # not single file
+    else:  # file1 file 2: proceed
+        fastlid.set_languages = None
+        lang1, _ = fastlid(text1)
+        lang2, _ = fastlid(text2)
+        df1 = files2df(file1, file2)
+        list1 = [elm for elm in df1.text1 if elm]
+        list2 = [elm for elm in df1.text2 if elm]
+        # len1 = len(list1)  # noqa
+        # len2 = len(list2)  # noqa
+        # exit if there are too many lines
+        len12 = len(list1) + len(list2)
+        logger.info(
+            "fast track: len1 %s, len2 %s, tot %s, max %s",
+            len(list1), len(list2), len(list1) + len(list2), 3 * len_max
+        )
+        if len12 > 3 * len_max:
+            return error_msg(f" Too many lines ({len(list1)} + {len(list2)} > {3 * len_max}), alignment op halted, sorry.", "info")
+        file_dl = Path(f"{Path(file1.name).stem[:-8]}-{Path(file2.name).stem[:-8]}.csv")
+        file_dl_xlsx = Path(
+            f"{Path(file1.name).stem[:-8]}-{Path(file2.name).stem[:-8]}.xlsx"
+        )
+        df_trimmed = trim_df(df1)
+    # --- end else single
+    lang_en_zh = ["en", "zh"]
+    logger.debug("lang1: %s, lang2: %s", lang1, lang2)
+    if debug:
+        ic(f" lang1: {lang1}, lang2: {lang2}")
+        ic("fast track? ", lang1 in lang_en_zh and lang2 in lang_en_zh)
+    # fast track
+    if lang1 in lang_en_zh and lang2 in lang_en_zh:
+        try:
+            cmat = lists2cmat(
+                list1,
+                list2,
+                tf_type=tf_type,
+                idf_type=idf_type,
+                dl_type=dl_type,
+                norm=norm,
+            )
+        except Exception as exc:
+            logger.error(exc)
+            return error_msg(exc)
+    # slow track
+    else:
+        logger.info(
+            "slow track: len1 %s, len2 %s, tot: %s, max %s",
+            len(list1), len(list2), len(list1) + len(list2),
+            3 * len_max
+        )
+        if len(list1) + len(list2) > 3 * len_max:
+            msg = (
+                f" len1 {len(list1)} + len2 {len(list2)} > {3 * len_max}. "
+                "This will take too long to complete "
+                "and will hog this experimental server and hinder "
+                "other users from trying the service. "
+                "Aborted...sorry"
+            )
+            return error_msg(msg, "info ")
+        try:
+            from radiobee.model_s import model_s  # pylint: disable=import-outside-toplevel
+            vec1 = model_s.encode(list1)
+            vec2 = model_s.encode(list2)
+            # cmat = vec1.dot(vec2.T)
+            cmat = vec2.dot(vec1.T)
+        except Exception as exc:
+            logger.error(exc)
+            _ = inspect.currentframe().f_lineno  # type: ignore
+            return error_msg(
+                f"{exc}, {Path(__file__).name} ln{_}, period"
+            )
+    tset = pd.DataFrame(cmat2tset(cmat))
+    tset.columns = ["x", "y", "cos"]
+    _ = """
+    df_trimmed = pd.concat(
+        [
+            df1.iloc[:4, :],
+            pd.DataFrame(
+                [
+                    [
+                        "...",
+                        "...",
+                    ]
+                ],
+                columns=df1.columns,
+            ),
+            df1.iloc[-4:, :],
+        ],
+        ignore_index=1,
+    )
+    # """
+    # process list1, list2 to obtained df_aligned
+    # quick fix ValueError: not enough values to unpack (expected at least 1, got 0)
+    # fixed in gen_pet, but we leave the loop here
+    for min_s in range(min_samples):
+        logger.info(" min_samples, using %s", min_samples - min_s)
+        try:
+            pset = gen_pset(
+                cmat,
+                eps=eps,
+                min_samples=min_samples - min_s,
+                delta=7,
+            )
+            break
+        except ValueError:
+            logger.info(" decrease min_samples by %s", min_s + 1)
+            continue
+        except Exception as e:
+            logger.error(e)
+            continue
+    else:
+        # break should happen above when min_samples = 2
+        raise Exception("bummer, this shouldn't happen, probably another bug")
+    min_samples = gen_pset.min_samples
+    # will result in error message:
+    # UserWarning: Starting a Matplotlib GUI outside of
+    # the main thread will likely fail."
+    _ = """
+    plot_cmat(
+        cmat,
+        eps=eps,
+        min_samples=min_samples,
+        xlabel=lang1,
+        ylabel=lang2,
+    )
+    # """
+    # move plot_cmat's code to the main thread here
+    # to make it work
+    xlabel = lang1
+    ylabel = lang2
+    len1, len2 = cmat.shape
+    ylim, xlim = len1, len2
+    # does not seem to show up
+    ic(f" len1 (ylim): {len1}, len2 (xlim): {len2}")
+    logger.debug(" len1 (ylim): %s, len2 (xlim): %s", len1, len2)
+    if debug:
+        print(f" len1 (ylim): {len1}, len2 (xlim): {len2}")
+    df_ = pd.DataFrame(cmat2tset(cmat))
+    df_.columns = ["x", "y", "cos"]
+    sns.set()
+    sns.set_style("darkgrid")
+    # close all existing figures, necesssary for hf spaces
+    plt.close("all")
+    # if sys.platform not in ["win32", "linux"]:
+    # going for noninterative
+    # to cater for Mac, thanks to WhiteFox
+    plt.switch_backend("Agg")
+    # figsize=(13, 8), (339, 212) mm on '1280x800+0+0'
+    fig = plt.figure(figsize=(13, 8))
+    # gs = fig.add_gridspec(2, 2, wspace=0.4, hspace=0.58)
+    gs = fig.add_gridspec(1, 2, wspace=0.4, hspace=0.58)
+    ax_heatmap = fig.add_subplot(gs[0, 0])  # ax2
+    ax0 = fig.add_subplot(gs[0, 1])
+    # ax1 = fig.add_subplot(gs[1, 0])
+    cmap = "viridis_r"
+    sns.heatmap(cmat, cmap=cmap, ax=ax_heatmap).invert_yaxis()
+    ax_heatmap.set_xlabel(xlabel)
+    ax_heatmap.set_ylabel(ylabel)
+    ax_heatmap.set_title("cos similarity heatmap")
+    fig.suptitle(f"alignment projection\n(eps={eps}, min_samples={min_samples})")
+    _ = DBSCAN(min_samples=min_samples, eps=eps).fit(df_).labels_ > -1
+    # _x = DBSCAN(min_samples=min_samples, eps=eps).fit(df_).labels_ < 0
+    _x = ~_
+    # max cos along columns
+    df_.plot.scatter("x", "y", c="cos", cmap=cmap, ax=ax0)
+    # outliers
+    df_[_x].plot.scatter("x", "y", c="r", marker="x", alpha=0.6, ax=ax0)
+    ax0.set_xlabel(xlabel)
+    ax0.set_ylabel(ylabel)
+    ax0.set_xlim(xmin=0, xmax=xlim)
+    ax0.set_ylim(ymin=0, ymax=ylim)
+    ax0.set_title(
+        "max along columns (x: outliers)\n"
+        "potential aligned pairs (green line), "
+        f"{round(sum(_) / xlim, 2):.0%}"
+    )
+    plt_loc = "img/plt.png"
+    ic(f" plotting to {plt_loc}")
+    plt.savefig(plt_loc)
+    # clustered
+    # df_[_].plot.scatter("x", "y", c="cos", cmap=cmap, ax=ax1)
+    # ax1.set_xlabel(xlabel)
+    # ax1.set_ylabel(ylabel)
+    # ax1.set_xlim(0, len1)
+    # ax1.set_title(f"potential aligned pairs ({round(sum(_) / len1, 2):.0%})")
+    # end of plot_cmat
+    src_len, tgt_len = cmat.shape
+    aset = gen_aset(pset, src_len, tgt_len)
+    final_list = align_texts(aset, list2, list1)  # note the order
+    # df_aligned
+    df_aligned = pd.DataFrame(final_list, columns=["text1", "text2", "likelihood"])
+    # swap text1 text2
+    df_aligned = df_aligned[["text2", "text1", "likelihood"]]
+    df_aligned.columns = ["text1", "text2", "likelihood"]
+    ic("paras aligned: ", df_aligned.head(10))
+    # round the last column to 2
+    # df_aligned.likelihood = df_aligned.likelihood.round(2)
+    # df_aligned = df_aligned.round({"likelihood": 2})
+    # df_aligned.likelihood = df_aligned.likelihood.apply(lambda x: np.nan if x in [""] else x)
+    if len(df_aligned) > 200:
+        df_html = None
+    else:  # show a one-bathc table in html
+        # style
+        styled = df_aligned.style.set_properties(
+            **{
+                "font-size": "10pt",
+                "border-color": "black",
+                "border": "1px black solid !important"
+            }
+            # border-color="black",
+        ).set_table_styles([{
+            "selector": "",  # noqs
+            "props": [("border", "2px black solid !important")]}]  # noqs
+        ).set_precision(2)
+        # .bar(subset="likelihood", color="#5fba7d")
+        # .background_gradient("Greys")
+        # df_html = df_aligned.to_html()
+        # df_html = styled.to_html()
+        df_html = styled.render()
+    # ===
+    if plot_dia:
+        output_plot = "img/plt.png"
+    else:
+        output_plot = None
+    _ = df_aligned.to_csv(index=False)
+    file_dl.write_text(_, encoding="utf8")
+    # file_dl.write_text(_, encoding="gb2312")  # no go
+    df_aligned.to_excel(file_dl_xlsx)
+    # return df_trimmed, plt
+    # return df_trimmed, plt, file_dl, file_dl_xlsx, df_aligned
+    # output_plot: gr.outputs.Image(type="auto", label="...")
+    # return df_trimmed, output_plot, file_dl, file_dl_xlsx, df_aligned
+    # return df_trimmed, output_plot, file_dl, file_dl_xlsx, styled, df_html  # gradio cant handle style
+    ic("sent-ali-algo: ", sent_ali_algo)
+    # ### sent-ali-algo is None: para align
+    if sent_ali_algo in ["None"]:
+        ic("returning para-ali outputs")
+        return df_trimmed, output_plot, file_dl, file_dl_xlsx, None, None, df_aligned, df_html
+    # ### proceed with sent align
+    if sent_ali_algo in ["fast"]:
+        ic(sent_ali_algo)
+        align_func = align_sents
+        ic(df_aligned.shape, df_aligned.columns)
+        aligned_sents = paras2sents(df_aligned, align_func)
+        # ic(pd.DataFrame(aligned_sents).shape, aligned_sents)
+        ic(pd.DataFrame(aligned_sents).shape)
+        df_aligned_sents = pd.DataFrame(aligned_sents, columns=["text1", "text2"])
+    else:  # ["slow"]
+        ic(sent_ali_algo)
+        align_func = shuffle_sents
+        aligned_sents = paras2sents(df_aligned, align_func, lang1, lang2)
+        # add extra entry if necessary
+        aligned_sents = [list(sent) + [""] if len(sent) == 2 else list(sent) for sent in aligned_sents]
+        df_aligned_sents = pd.DataFrame(aligned_sents, columns=["text1", "text2", "likelihood"])
+    # prepare sents downloads
+    file_dl_sents = Path(f"{file_dl.stem}-sents{file_dl.suffix}")
+    file_dl_xlsx_sents = Path(f"{file_dl_xlsx.stem}-sents{file_dl_xlsx.suffix}")
+    _ = df_aligned_sents.to_csv(index=False)
+    file_dl_sents.write_text(_, encoding="utf8")
+    df_aligned_sents.to_excel(file_dl_xlsx_sents)
+    # prepare html output
+    if len(df_aligned_sents) > 200:
+        df_html = None
+    else:  # show a one-bathc table in html
+        # style
+        styled = df_aligned_sents.style.set_properties(
+            **{
+                "font-size": "10pt",
+                "border-color": "black",
+                "border": "1px black solid !important"
+            }
+            # border-color="black",
+        ).set_table_styles([{
+            "selector": "",  # noqs
+            "props": [("border", "2px black solid !important")]}]  # noqs
+        ).format(
+            precision=2
+        )
+        df_html = styled.to_html()
+    # aligned sents outputs
+    ic("aligned sents outputs")
+    # return df_trimmed, output_plot, file_dl, file_dl_xlsx, None, None, df_aligned, df_html
+    return df_trimmed, output_plot, file_dl, file_dl_xlsx, file_dl_sents, file_dl_xlsx_sents, df_aligned_sents, df_html
+if __name__ == "__main__":
+    # typer.run(radiobee_cli)
+    app()

radiobee/trim_df.py CHANGED Viewed

@@ -14,12 +14,8 @@ def trim_df(
             [
                 df1.iloc[:len_, :],
                 pd.DataFrame(
-                    [
-                        [
-                            "...",
-                            "...",
-                        ]
-                    ],
                     columns=df1.columns,
                 ),
                 df1.iloc[-len_:, :],

             [
                 df1.iloc[:len_, :],
                 pd.DataFrame(
+                    # [["...", "...",]],
+                    [["..."] * len(df1.columns)],
                     columns=df1.columns,
                 ),
                 df1.iloc[-len_:, :],

requirements.txt CHANGED Viewed

@@ -27,4 +27,7 @@ nltk
 sentence_splitter
 icecream
 # lazy
-alive-progress

 sentence_splitter
 icecream
 # lazy
+alive-progress
+# cli
+click