arxiv:2403.16630

A comparative analysis of embedding models for patent similarity

Published on Mar 25, 2024

Authors:

Abstract

Patent-specific pretrained embedding models, including static word embeddings and contextual word embeddings, are evaluated for patent similarity calculation, with domain-adapted Sentence Transformers showing superior performance.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

This paper makes two contributions to the field of text-based patent similarity. First, it compares the performance of different kinds of patent-specific pretrained embedding models, namely static word embeddings (such as word2vec and doc2vec models) and contextual word embeddings (such as transformers based models), on the task of patent similarity calculation. Second, it compares specifically the performance of Sentence Transformers (SBERT) architectures with different training phases on the patent similarity task. To assess the models' performance, we use information about patent interferences, a phenomenon in which two or more patent claims belonging to different patent applications are proven to be overlapping by patent examiners. Therefore, we use these interferences cases as a proxy for maximum similarity between two patents, treating them as ground-truth to evaluate the performance of the different embedding models. Our results point out that, first, Patent SBERT-adapt-ub, the domain adaptation of the pretrained Sentence Transformer architecture proposed in this research, outperforms the current state-of-the-art in patent similarity. Second, they show that, in some cases, large static models performances are still comparable to contextual ones when trained on extensive data; thus, we believe that the superiority in the performance of contextual embeddings may not be related to the actual architecture but rather to the way the training phase is performed.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2403.16630

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.16630 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2403.16630 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.