--- license: apache-2.0 tags: - eu - public procurement - cpv - sector - multilingual - transformers - text-classification widget: - text: "Oppegรฅrd municipality, hereafter called the contracting authority, intends to enter into a framework agreement with one supplier for the procurement of fresh bread and bakery products for Oppegรฅrd municipality. The contract is estimated to NOK 1 400 000 per annum excluding VAT The total for the entire period including options is NOK 5 600 000 excluding VAT" --- # multilingual-cpv-sector-classifier This model is a fine-tuned version of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) on [the Tenders Economic Daily Public Procurement Data](https://simap.ted.europa.eu/en). It achieves the following results on the evaluation set: - F1 Score: 0.686 ## Model description The model takes procurement descriptions written in any of [104 languages](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages) and classifies them into 45 sector classes represented by [CPV(Common Procurement Vocabulary)](https://simap.ted.europa.eu/en_GB/web/simap/cpv) code descriptions as listed below. | Common Procurement Vocabulary | |:-----------------------------| | Administration, defence and social security services. ๐Ÿ‘ฎโ€โ™€๏ธ | | Agricultural machinery. ๐Ÿšœ | | Agricultural, farming, fishing, forestry and related products. ๐ŸŒพ | | Agricultural, forestry, horticultural, aquacultural and apicultural services. ๐Ÿ‘จ๐Ÿฟโ€๐ŸŒพ | | Architectural, construction, engineering and inspection services. ๐Ÿ‘ทโ€โ™‚๏ธ | | Business services: law, marketing, consulting, recruitment, printing and security. ๐Ÿ‘ฉโ€๐Ÿ’ผ | | Chemical products. ๐Ÿงช | | Clothing, footwear, luggage articles and accessories. ๐Ÿ‘– | | Collected and purified water. ๐ŸŒŠ | | Construction structures and materials; auxiliary products to construction (excepts electric apparatus). ๐Ÿงฑ | | Construction work. ๐Ÿ—๏ธ | | Education and training services. ๐Ÿ‘ฉ๐Ÿฟโ€๐Ÿซ | | Electrical machinery, apparatus, equipment and consumables; Lighting. โšก | | Financial and insurance services. ๐Ÿ‘จโ€๐Ÿ’ผ | | Food, beverages, tobacco and related products. ๐Ÿฝ๏ธ | | Furniture (incl. office furniture), furnishings, domestic appliances (excl. lighting) and cleaning products. ๐Ÿ—„๏ธ | | Health and social work services. ๐Ÿ‘จ๐Ÿฝโ€โš•๏ธ | | Hotel, restaurant and retail trade services. ๐Ÿจ | | IT services: consulting, software development, Internet and support. ๐Ÿ–ฅ๏ธ | | Industrial machinery. ๐Ÿญ | | Installation services (except software). ๐Ÿ› ๏ธ | | Laboratory, optical and precision equipments (excl. glasses). ๐Ÿ”ฌ | | Leather and textile fabrics, plastic and rubber materials. ๐Ÿงต | | Machinery for mining, quarrying, construction equipment. โ›๏ธ | | Medical equipments, pharmaceuticals and personal care products. ๐Ÿ’‰ | | Mining, basic metals and related products. โš™๏ธ | | Musical instruments, sport goods, games, toys, handicraft, art materials and accessories. ๐ŸŽธ | | Office and computing machinery, equipment and supplies except furniture and software packages. ๐Ÿ–จ๏ธ | | Other community, social and personal services. ๐Ÿง‘๐Ÿฝโ€๐Ÿคโ€๐Ÿง‘๐Ÿฝ | | Petroleum products, fuel, electricity and other sources of energy. ๐Ÿ”‹ | | Postal and telecommunications services. ๐Ÿ“ถ | | Printed matter and related products. ๐Ÿ“ฐ | | Public utilities. โ›ฒ | | Radio, television, communication, telecommunication and related equipment. ๐Ÿ“ก | | Real estate services. ๐Ÿ  | | Recreational, cultural and sporting services. ๐Ÿšด | | Repair and maintenance services. ๐Ÿ”ง | | Research and development services and related consultancy services. ๐Ÿ‘ฉโ€๐Ÿ”ฌ | | Security, fire-fighting, police and defence equipment. ๐Ÿงฏ | | Services related to the oil and gas industry. โ›ฝ | | Sewage-, refuse-, cleaning-, and environmental services. ๐Ÿงน | | Software package and information systems. ๐Ÿ”ฃ | | Supporting and auxiliary transport services; travel agencies services. ๐Ÿšƒ | | Transport equipment and auxiliary products to transportation. ๐ŸšŒ | | Transport services (excl. Waste transport). ๐Ÿ’บ ## Intended uses & limitations - Input description should be written in any of [the 104 languages](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages) that MBERT supports. - The model is just evaluated in 22 languages. Thus there is no information about the performances in other languages. - The domain is also restricted by the awarded procurement notice descriptions in European Union. Evaluating on whole document texts might change the performance. ## Training and evaluation data - The whole data consists of 744,360 rows. Shuffled and split into train and validation sets by using 80%/20% manner. - Each description represents a unique contract notice description awarded between 2011 and 2018. - Both training and validation data have contract notice descriptions written in 22 European Languages. (Malta and Irish are extracted due to scarcity compared to whole data) ## Training procedure The training procedure has been completed on Google Cloud V3-8 TPUs. Thanks [Google](https://sites.research.google/trc/about/) for giving the access to Cloud TPUs ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - num_epochs: 3 - gradient_accumulation_steps: 8 - batch_size_per_device: 4 - total_train_batch_size: 32 ### Training results | Epoch | Step | F1 Score| |:-----:|:------:|:------:| | 1 | 18,609 | 0.630 | | 2 | 37,218 | 0.674 | | 3 | 55,827 | 0.686 | | Language| F1 Score| Test Size| |:-----:|:-----:|:-----:| | PL| 0.759| 13950| | RO| 0.736| 3522| | SK| 0.719| 1122| | LT| 0.687| 2424| | HU| 0.681| 1879| | BG| 0.675| 2459| | CS| 0.668| 2694| | LV| 0.664| 836| | DE| 0.645| 35354| | FI| 0.644| 1898| | ES| 0.643| 7483| | PT| 0.631| 874| | EN| 0.631| 16615| | HR| 0.626| 865| | IT| 0.626| 8035| | NL| 0.624| 5640| | EL| 0.623| 1724| | SL| 0.615| 482| | SV| 0.607| 3326| | DA| 0.603| 1925| | FR| 0.601| 33113| | ET| 0.572| 458||