vitorbborges commited on
Commit
9901627
1 Parent(s): 89b9a42

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +543 -1
README.md CHANGED
@@ -7,6 +7,548 @@ sdk: gradio
7
  sdk_version: 3.9.1
8
  app_file: app.py
9
  pinned: false
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  sdk_version: 3.9.1
8
  app_file: app.py
9
  pinned: false
10
+ language: en
11
+ tags:
12
+ - Recommendation
13
+ license: apache-2.0
14
+ datasets:
15
+ - surprise
16
+ - numpy
17
+ - keras
18
+ - pandas
19
+ thumbnail: https://github.com/Marcosdib/S2Query/Classification_Architecture_model.png
20
  ---
21
 
22
+ ![MCTIimg](https://antigo.mctic.gov.br/mctic/export/sites/institucional/institucional/entidadesVinculadas/conselhos/pag-old/RODAPE_MCTI.png)
23
+
24
+
25
+ # MCTI Recommendation Task (uncased) DRAFT
26
+
27
+ Disclaimer: The Brazilian Ministry of Science, Technology, and Innovation (MCTI) has partially supported this project.
28
+
29
+ The model [NLP MCTI Recommendation Multi](https://huggingface.co/spaces/unb-lamfo-nlp-mcti/nlp-mcti-lda-recommender) is part of the project [Research Financing Product Portfolio (FPP)](https://huggingface.co/unb-lamfo-nlp-mcti) focuses
30
+ on the task of Recommendation and explores different machine learning strategies that provide suggestions of items that are likely to be handy for a particular individual. Several methods were faced against each other to compare the error estimatives.
31
+ Using LDA model, a simulated dataset was created.
32
+
33
+ ## According to the abstract,
34
+
35
+ Current model card disposes model's description and it's classes. Also, inteded uses are described along with a "how to use" section, exposing necessary conditions for the data used.
36
+ Further in the card, data and it's limitation and bias were discussed. Tables along the page supports the information and tests that were made.
37
+ How the recommendation is made, datasets used and the benchmarks generated are all set all over the model card.
38
+
39
+ ## Model description
40
+
41
+ The surprise library provides 11 classifier models that try to predict the classification of training data based on several different collaborative-filtering techniques.
42
+ The models provided with a brief explanation in English are mentioned below, for more information please refer to the package [documentation](https://surprise.readthedocs.io/en/stable/prediction_algorithms_package.html).
43
+
44
+ random_pred.NormalPredictor: Algorithm predicting a random rating based on the distribution of the training set, which is assumed to be normal.
45
+ baseline_only.BaselineOnly: Algorithm predicting the baseline estimate for given user and item.
46
+
47
+ knns.KNNBasic: A basic collaborative filtering algorithm.
48
+
49
+ knns.KNNWithMeans: A basic collaborative filtering algorithm, taking into account the mean ratings of each user.
50
+
51
+ knns.KNNWithZScore: A basic collaborative filtering algorithm, taking into account the z-score normalization of each user.
52
+
53
+ knns.KNNBaseline: A basic collaborative filtering algorithm taking into account a baseline rating.
54
+
55
+ matrix_factorization.SVD: The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize.
56
+ matrix_factorization.SVDpp: The SVD++ algorithm, an extension of SVD taking into account implicit ratings.
57
+
58
+ matrix_factorization.NMF: A collaborative filtering algorithm based on Non-negative Matrix Factorization.
59
+ slope_one.SlopeOne: A simple yet accurate collaborative filtering algorithm.
60
+
61
+ co_clustering.CoClustering: A collaborative filtering algorithm based on co-clustering.
62
+ It is possible to pass a custom dataframe as an argument to this class. The dataframe in question needs to have 3 columns with the following name: ['userID', 'itemID', 'rating'].
63
+ ```python
64
+ class Method:
65
+ def __init__(self,df):
66
+
67
+ self.df=df
68
+ self.available_methods=[
69
+ 'surprise.NormalPredictor',
70
+ 'surprise.BaselineOnly',
71
+ 'surprise.KNNBasic',
72
+ 'surprise.KNNWithMeans',
73
+ 'surprise.KNNWithZScore',
74
+ 'surprise.KNNBaseline',
75
+ 'surprise.SVD',
76
+ 'surprise.SVDpp',
77
+ 'surprise.NMF',
78
+ 'surprise.SlopeOne',
79
+ 'surprise.CoClustering',
80
+ ]
81
+
82
+ def show_methods(self):
83
+ print('The avaliable methods are:')
84
+ for i,method in enumerate(self.available_methods):
85
+ print(str(i)+': '+method)
86
+
87
+
88
+ def run(self,the_method):
89
+ self.the_method=the_method
90
+ if(self.the_method[0:8]=='surprise'):
91
+ self.run_surprise()
92
+ elif(self.the_method[0:6]=='Gensim'):
93
+ self.run_gensim()
94
+ elif(self.the_method[0:13]=='Transformers-'):
95
+ self.run_transformers()
96
+ else:
97
+ print('This method is not defined! Try another one.')
98
+ def run_surprise(self):
99
+ from surprise import Reader
100
+ from surprise import Dataset
101
+ from surprise.model_selection import train_test_split
102
+ reader = Reader(rating_scale=(1, 5))
103
+ data = Dataset.load_from_df(self.df[['userID', 'itemID', 'rating']], reader)
104
+ trainset, testset = train_test_split(data, test_size=.30)
105
+ the_method=self.the_method.replace("surprise.", "")
106
+ eval(f"exec('from surprise import {the_method}')")
107
+ the_algorithm=locals()[the_method]()
108
+ the_algorithm.fit(trainset)
109
+ self.predictions=the_algorithm.test(testset)
110
+ list_predictions=[(uid,iid,r_ui,est) for uid,iid,r_ui,est,_ in self.predictions]
111
+ self.predictions_df = pd.DataFrame(list_predictions, columns =['user_id', 'item_id', 'rating','predicted_rating'])
112
+ ```
113
+ Every model was used and evaluated. When faced with each other different methods presented different error estimatives.
114
+
115
+ The surprise library provides 4 different methods to assess the accuracy of the ratings prediction. Those are: rmse, mse, mae and fcp. For further discussion on each metric please visit the package documentation.
116
+
117
+ ```python
118
+ class Evaluator:
119
+ def __init__(self,predictions_df):
120
+ self.available_evaluators=['surprise.rmse','surprise.mse',
121
+ 'surprise.mae','surprise.fcp']
122
+ self.predictions_df=predictions_df
123
+
124
+ def show_evaluators(self):
125
+ print('The avaliable evaluators are:')
126
+ for i,evaluator in enumerate(self.available_evaluators):
127
+ print(str(i)+': '+evaluator)
128
+
129
+ def run(self,the_evaluator):
130
+ self.the_evaluator=the_evaluator
131
+ if(self.the_evaluator[0:8]=='surprise'):
132
+ self.run_surprise()
133
+ else:
134
+ print('This evaluator is not available!')
135
+ def run_surprise(self):
136
+ import surprise
137
+ from surprise import accuracy
138
+ predictions=[surprise.prediction_algorithms.predictions.Prediction(row['user_id'],row['item_id'],row['rating'],row['predicted_rating'],{}) for index,row in self.predictions_df.iterrows()]
139
+ self.predictions=predictions
140
+ self.the_evaluator= 'accuracy.' + self.the_evaluator.replace("surprise.", "")
141
+ self.acc = eval(f'{self.the_evaluator}(predictions,verbose=True)')
142
+ ```
143
+ ## Intended uses
144
+ You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
145
+ be fine-tuned on a downstream task. See the [model hub](https://www.google.com) to look for
146
+ fine-tuned versions of a task that interests you.
147
+ Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked)
148
+ to make decisions, such as sequence classification, token classification or question answering. For tasks such as text
149
+ generation you should look at model like XXX.
150
+ ### How to use
151
+ The datasets for collaborative filtering must be:
152
+ - The dataframe containing the ratings.
153
+ - It must have three columns, corresponding to the user (raw) ids,
154
+ the item (raw) ids, and the ratings, in this order.
155
+ ```python
156
+ >>> import pandas as pd
157
+ >>> import numpy as np
158
+ class Data:
159
+ ````
160
+ The databases (ml_100k, ml_1m and jester) are built-in the surprise package for
161
+ collaborative-filtering.
162
+ ```python
163
+ def_init_(self):
164
+ self.available_databases=['ml_100k', 'ml_1m','jester', 'lda_topics', 'lda_rankings', 'uniform']
165
+ def show_available_databases(self):
166
+ print('The avaliable database are:')
167
+ for i,database in enumerate(self.available_databases):
168
+ print(str(i)+': '+database)
169
+
170
+ def read_data(self,database_name):
171
+ self.database_name=database_name
172
+ self.the_data_reader= getattr(self, 'read_'+database_name.lower())
173
+ self.the_data_reader()
174
+ def read_ml_100k(self):
175
+ from surprise import Dataset
176
+ data = Dataset.load_builtin('ml-100k')
177
+ self.df = pd.DataFrame(data.__dict__['raw_ratings'], columns=['user_id','item_id','rating','timestamp'])
178
+ self.df.drop(columns=['timestamp'],inplace=True)
179
+ self.df.rename({'user_id':'userID','item_id':'itemID'},axis=1,inplace=True)
180
+ def read_ml_1m(self):
181
+ from surprise import Dataset
182
+ data = Dataset.load_builtin('ml-1m')
183
+ self.df = pd.DataFrame(data.__dict__['raw_ratings'], columns=['user_id','item_id','rating','timestamp'])
184
+ self.df.drop(columns=['timestamp'],inplace=True)
185
+ self.df.rename({'user_id':'userID','item_id':'itemID'},axis=1,inplace=True)
186
+ def read_jester(self):
187
+ from surprise import Dataset
188
+ data = Dataset.load_builtin('jester')
189
+ self.df = pd.DataFrame(data.__dict__['raw_ratings'], columns=['user_id','item_id','rating','timestamp'])
190
+ self.df.drop(columns=['timestamp'],inplace=True)
191
+ self.df.rename({'user_id':'userID','item_id':'itemID'},axis=1,inplace=True)
192
+ ```
193
+ Hyperparameters -
194
+
195
+ `n_users` : number of simulated users in the database;
196
+
197
+ `n_ratings` : number of simulated rating events in the database.
198
+
199
+ This is a fictional dataset based in the choice of an uniformly distributed random rating(from 1 to 5) for one of the simulated users of the recommender-system that is being designed in this research project.
200
+ ```python
201
+
202
+ def read_uniform(self):
203
+ n_users = 20
204
+ n_ratings = 10000
205
+
206
+ import random
207
+
208
+ opo = pd.read_csv('../oportunidades.csv')
209
+ df = [(random.randrange(n_users), random.randrange(len(opo)), random.randrange(1,5)) for i in range(n_ratings)]
210
+ self.df = pd.DataFrame(df, columns = ['userID', 'itemID', 'rating'])
211
+ ```
212
+ Hyperparameters -
213
+
214
+ n_users` : number of simulated users in the database;
215
+
216
+ n_ratings` : number of simulated rating events in the database.
217
+
218
+ This first LDA based dataset builds a model with K = `n_users` topics. LDA topics are used as proxies for simulated users with different clusters of interest. At first a random opportunity is chosen, than the amount of a randomly chosen topic inside the description is multiplied by five. The ceiling operation of this result is the rating that the fictional user will give to that opportunity. Because the amount of each topic predicted by the model is disollved among various topics, it is very rare to find an opportunity that has a higher LDA value. The consequence is that this dataset has really low volatility and the major part of ratings are equal to 1.
219
+ ```python
220
+ def read_lda_topics(self):
221
+ n_users = 20
222
+ n_ratings = 10000
223
+
224
+ import gensim
225
+ import random
226
+ import math
227
+
228
+ opo = pd.read_csv('../oportunidades_results.csv')
229
+ # opo = opo.iloc[np.where(opo['opo_brazil']=='Y')]
230
+
231
+ try:
232
+ lda_model = gensim.models.ldamodel.LdaModel.load(f'models/lda_model{n_users}.model')
233
+ except:
234
+ import generate_users
235
+ generate_users.gen_model(n_users)
236
+ lda_model = gensim.models.ldamodel.LdaModel.load(f'models/lda_model{n_users}.model')
237
+ df = []
238
+ for i in range(n_ratings):
239
+ opo_n = random.randrange(len(opo))
240
+ txt = opo.loc[opo_n,'opo_texto']
241
+ opo_bow = lda_model.id2word.doc2bow(txt.split())
242
+ topics = lda_model.get_document_topics(opo_bow)
243
+ topics = {topic[0]:topic[1] for topic in topics}
244
+ user = random.sample(topics.keys(), 1)[0]
245
+ rating = math.ceil(topics[user]*5)
246
+ df.append((user, opo_n, rating))
247
+ self.df = pd.DataFrame(df, columns = ['userID', 'itemID', 'rating'])
248
+
249
+ def read_lda_rankings(self):
250
+ n_users = 9
251
+ n_ratings = 1000
252
+
253
+ import gensim
254
+ import random
255
+ import math
256
+ import tqdm
257
+
258
+ opo = pd.read_csv('../oportunidades.csv')
259
+ opo = opo.iloc[np.where(opo['opo_brazil']=='Y')]
260
+ opo.index = range(len(opo))
261
+
262
+ path = f'models/output_linkedin_cle_lda_model_{n_users}_topics_symmetric_alpha_auto_beta'
263
+ lda_model = gensim.models.ldamodel.LdaModel.load(path)
264
+
265
+ df = []
266
+
267
+ pbar = tqdm.tqdm(total= n_ratings)
268
+ for i in range(n_ratings):
269
+ opo_n = random.randrange(len(opo))
270
+ txt = opo.loc[opo_n,'opo_texto']
271
+ opo_bow = lda_model.id2word.doc2bow(txt.split())
272
+ topics = lda_model.get_document_topics(opo_bow)
273
+ topics = {topic[0]:topic[1] for topic in topics}
274
+ prop = pd.DataFrame([topics], index=['prop']).T.sort_values('prop', ascending=True)
275
+ prop['rating'] = range(1, len(prop)+1)
276
+ prop['rating'] = prop['rating']/len(prop)
277
+ prop['rating'] = prop['rating'].apply(lambda x: math.ceil(x*5))
278
+ prop.reset_index(inplace=True)
279
+ prop = prop.sample(1)
280
+ df.append((prop['index'].values[0], opo_n, prop['rating'].values[0]))
281
+ pbar.update(1)
282
+ pbar.close()
283
+ self.df = pd.DataFrame(df, columns = ['userID', 'itemID', 'rating'])
284
+ ```
285
+
286
+ ### Limitations and bias
287
+
288
+ In this model we have faced some obstacles that we had overcome, but some of those, by the nature of the project, couldn't be totally solved.
289
+ Databases containing profiles of possible users of the planned prototype are not available.
290
+ For this reason, it was necessary to carry out simulations in order to represent the interests of these users, so that the recommendation system could be modeled.
291
+ A simulation of clusters of latent interests was realized, based on topics present in the texts describing financial products. Due the fact that the dataset was build it by ourselves, there was no interaction yet between a user and the dataset, therefore we don't have
292
+ realistic ratings, making the results less believable.
293
+
294
+ Later on, we have used a database of scrappings of linkedin profiles.
295
+ The problem is that the profiles that linkedin shows is biased, so the profiles that appears was geographically closed, or related to the users organization and email.
296
+
297
+ ## Training data
298
+ To train the Latent Dirichlet allocation (LDA) model, it was used a database of a scrapping of Researchers profiles on Linkedin.
299
+ ## Training procedure
300
+
301
+ ## Evaluation results
302
+
303
+ ## Checkpoints
304
+
305
+ - Example
306
+ ```python
307
+ data=Data()
308
+ data.show_available_databases()
309
+ data.read_data('ml_100k')
310
+ method=Method(data.df)
311
+ method.show_methods()
312
+ method.run('surprise.KNNWithMeans')
313
+ predictions_df=method.predictions_df
314
+ evaluator=Evaluator(predictions_df)
315
+ evaluator.show_evaluators()
316
+ evaluator.run('surprise.mse')
317
+ ```
318
+ The avaliable database are:
319
+ 0: ml_100k
320
+ 1: ml_1m
321
+
322
+ 2: jester
323
+
324
+ 3: lda_topics
325
+ 4: lda_rankings
326
+
327
+ 5: uniform
328
+
329
+ The avaliable methods are:
330
+
331
+ 0: surprise.NormalPredictor
332
+
333
+ 1: surprise.BaselineOnly
334
+
335
+ 2: surprise.KNNBasic
336
+
337
+ 3: surprise.KNNWithMeans
338
+
339
+ 4: surprise.KNNWithZScore
340
+
341
+ 5: surprise.KNNBaseline
342
+
343
+ 6: surprise.SVD
344
+
345
+ 7: surprise.SVDpp
346
+
347
+ 8: surprise.NMF
348
+
349
+ 9: surprise.SlopeOne
350
+
351
+ 10: surprise.CoClustering
352
+
353
+ Computing the msd similarity matrix...
354
+
355
+ Done computing similarity matrix.
356
+
357
+ The avaliable evaluators are:
358
+
359
+ 0: surprise.rmse
360
+
361
+ 1: surprise.mse
362
+
363
+ 2: surprise.mae
364
+
365
+ 3: surprise.fcp
366
+
367
+ MSE: 0.9146
368
+
369
+
370
+ Next, we have the code that builds the table with the accuracy metrics for all rating prediction models built-in the surprise package. The expected return of this function is a pandas dataframe (11x4) corresponding to the 11 classifier models and 4 different accuracy metrics.
371
+
372
+ ```python
373
+ def model_table(label):
374
+ import tqdm
375
+
376
+ table = pd.DataFrame()
377
+
378
+ data=Data()
379
+ data.read_data(label)
380
+
381
+ method=Method(data.df)
382
+
383
+
384
+ for m in method.available_methods:
385
+ print(m)
386
+ method.run(m)
387
+ predictions_df=method.predictions_df
388
+ evaluator=Evaluator(predictions_df)
389
+
390
+ metrics = []
391
+
392
+ for e in evaluator.available_evaluators:
393
+ evaluator.run(e)
394
+ metrics.append(evaluator.acc)
395
+
396
+ table = table.append(dict(zip(evaluator.available_evaluators,metrics)),ignore_index=True)
397
+
398
+ table.index = [x[9:] for x in method.available_methods]
399
+ table.columns = [x[9:].upper() for x in evaluator.available_evaluators]
400
+
401
+ return table
402
+
403
+ import sys, os
404
+
405
+ sys.stdout = open(os.devnull, 'w') # Codigo para desativar os prints
406
+
407
+ uniform = model_table('uniform')
408
+ #topics = model_table('lda_topics')
409
+ ranking = model_table('lda_rankings')
410
+ sys.stdout = sys.__stdout__ # Codigo para reativar os prints
411
+ ```
412
+ - Usage Example
413
+ In this section it will be explained how the recommendation is made for the user.
414
+ ```python
415
+ import gradio as gr
416
+ import random
417
+ import pandas as pd
418
+ opo = pd.read_csv('oportunidades_results.csv', lineterminator='\n')
419
+ # opo = opo.iloc[np.where(opo['opo_brazil']=='Y')]
420
+ simulation = pd.read_csv('simulation2.csv')
421
+ userID = max(simulation['userID']) + 1
422
+ This function, creates the string that it will be displayed to the user on the app, showing the opportunities title, link and the resume.
423
+ def build_display_text(opo_n):
424
+
425
+ title = opo.loc[opo_n]['opo_titulo']
426
+ link = opo.loc[opo_n]['link']
427
+ summary = opo.loc[opo_n]['facebook-bart-large-cnn_results']
428
+ display_text = f"**{title}**\n\nURL:\n{link}\n\nSUMMARY:\n{summary}"
429
+ return display_text
430
+ ```
431
+ Here it will be generate 4 random opportunities.
432
+
433
+ ```python
434
+ opo_n_one = random.randrange(len(opo))
435
+ opo_n_two = random.randrange(len(opo))
436
+ opo_n_three = random.randrange(len(opo))
437
+ opo_n_four = random.randrange(len(opo))
438
+ evaluated = []
439
+ ```
440
+ The next function, is the "predict_next", that accepts an option and a rating.
441
+ ```python
442
+ def predict_next(option, nota):
443
+ global userID
444
+ global opo_n_one
445
+ global opo_n_two
446
+ global opo_n_three
447
+ global opo_n_four
448
+ global evaluated
449
+ global opo
450
+ global simulation
451
+ ```
452
+ Here it will be taken the number, on our database, of the rated opportunity.
453
+ ```python
454
+ selected = [opo_n_one, opo_n_two, opo_n_three, opo_n_four][int(option)-1]
455
+ ```
456
+ Here is created a new database called simulation, that takes the previous simulation then adds a new line with te ID of the user, the rated item and the rate. integrates the selected opportunity.
457
+
458
+ ```python
459
+ simulation = simulation.append({'userID': userID, 'itemID': selected, 'rating': nota}, ignore_index=True)
460
+ evaluated.append(selected)
461
+
462
+ from surprise import Reader
463
+ reader = Reader(rating_scale=(1, 5))
464
+ from surprise import Dataset
465
+ data = Dataset.load_from_df(simulation[['userID', 'itemID', 'rating']], reader)
466
+ trainset = data.build_full_trainset()
467
+ from surprise import SVDpp
468
+ svdpp = SVDpp()
469
+ svdpp.fit(trainset)
470
+ items = list()
471
+ est = list()
472
+ for i in range(len(opo)):
473
+ if i not in evaluated:
474
+ items.append(i)
475
+ est.append(svdpp.predict(userID, i).est)
476
+ opo_n_one = items[est.index(sorted(est)[-1])]
477
+ opo_n_two = items[est.index(sorted(est)[-2])]
478
+ opo_n_three = items[est.index(sorted(est)[-3])]
479
+ opo_n_four = items[est.index(sorted(est)[-4])]
480
+ return build_display_text(opo_n_one), build_display_text(opo_n_two), build_display_text(opo_n_three), build_display_text(opo_n_four)
481
+ ```
482
+
483
+ Here we have the interation of gradio, that allows the construction of the app.
484
+
485
+ ```python
486
+ with gr.Blocks() as demo:
487
+ with gr.Row():
488
+ one_opo = gr.Textbox(build_display_text(opo_n_one), label='Oportunidade 1')
489
+ two_opo = gr.Textbox(build_display_text(opo_n_two), label='Oportunidade 2')
490
+ with gr.Row():
491
+ three_opo = gr.Textbox(build_display_text(opo_n_three), label='Oportunidade 3')
492
+ four_opo = gr.Textbox(build_display_text(opo_n_four), label='Oportunidade 4')
493
+ with gr.Row():
494
+ option = gr.Radio(['1', '2', '3', '4'], label='Opção', value = '1')
495
+ with gr.Row():
496
+ nota = gr.Slider(1,5,step=1,label="Nota 1")
497
+ with gr.Row():
498
+ confirm = gr.Button("Confirmar")
499
+ confirm.click(fn=predict_next,
500
+ inputs=[option, nota],
501
+ outputs=[one_opo, two_opo, three_opo, four_opo])
502
+ if __name__ == "__main__":
503
+ demo.launch()
504
+ ```
505
+
506
+ ## Benchmarks
507
+
508
+ ```python
509
+ # LDA-GENERATED DATASET
510
+ ranking
511
+ ```
512
+ | | RMSE | MSE | MAE | FCP |
513
+ |-----------------|-----------|-----------|-----------|-----------|
514
+ | NormalPredictor | 1.820737 | 3.315084 | 1.475522 | 0.514134 |
515
+ | BaselineOnly | 1.072843 | 1.150992 | 0.890233 | 0.556560 |
516
+ | KNNBasic | 1.232248 | 1.518436 | 0.936799 | 0.648604 |
517
+ | KNNWithMeans | 1.124166 | 1.263750 | 0.808329 | 0.597148 |
518
+ | KNNWithZScore | 1.056550 | 1.116299 | 0.750004 | 0.669651 |
519
+ | KNNBaseline | 1.134660 | 1.287454 | 0.825161 | 0.614270 |
520
+ | SVD | 0.977468 | 0.955444 | 0.757485 | 0.723829 |
521
+ | SVDpp | 0.843065 | 0.710758 | 0.670516 | 0.671737 |
522
+ | NMF | 1.122684 | 1.260420 | 0.722101 | 0.688728 |
523
+ | SlopeOne | 1.073552 | 1.152514 | 0.747142 | 0.651937 |
524
+ | CoClustering | 1.293383 | 1.672838 | 1.007951 | 0.494174 |
525
+
526
+ ```python
527
+ # BENCHMARK DATASET
528
+ uniform
529
+ ```
530
+ | | RMSE | MSE | MAE | FCP |
531
+ |-----------------|-----------|-----------|-----------|-----------|
532
+ | NormalPredictor | 1.508925 | 2.276854 | 1.226758 | 0.503723 |
533
+ | BaselineOnly | 1.153331 | 1.330172 | 1.022732 | 0.506818 |
534
+ | KNNBasic | 1.205058 | 1.452165 | 1.026591 | 0.501168 |
535
+ | KNNWithMeans | 1.202024 | 1.444862 | 1.028149 | 0.503527 |
536
+ | KNNWithZScore | 1.216041 |1.478756 | 1.041070 | 0.501582 |
537
+ | KNNBaseline | 1.225609 | 1.502117 | 1.048107 | 0.498198 |
538
+ | SVD | 1.176273 | 1.383619 | 1.013285 | 0.502067 |
539
+ | SVDpp | 1.192619 | 1.422340 | 1.018717 | 0.500909 |
540
+ | NMF | 1.338216 | 1.790821 | 1.120604 | 0.492944 |
541
+ | SlopeOne | 1.224219 | 1.498713 | 1.047170 | 0.494298 |
542
+ | CoClustering | 1.223020 | 1.495778 | 1.033699 | 0.518509 |
543
+
544
+
545
+ ### BibTeX entry and citation info
546
+ ```bibtex
547
+ @unpublished{recommend22,
548
+ author ={Jo\~{a}o Gabriel de Moraes Souza. and Daniel Oliveira Cajueiro. and Johnathan de O. Milagres. and Vin\´{i}cius de Oliveira Watanabe. and V\´{i}tor Bandeira Borges. and Victor Rafael Celestino.},
549
+ title ={A comprehensive review of recommendation systems: method, data, evaluation and coding},
550
+ }
551
+ ```
552
+ <a href="https://huggingface.co/exbert/?model=bert-base-uncased">
553
+ <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
554
+ </a>