λŒ€λŸ‰μ˜ ν•œκΈ€ νŠΉν—ˆ λ°μ΄ν„°λ‘œ μ‚¬μ „ν•™μŠ΅ (pre-training)을 μ§„ν–‰ν•œ DeBERTa-v2 λͺ¨λΈμž…λ‹ˆλ‹€.

νŠΉν—ˆ λ¬Έμ„œμ˜ abstract, claims, description μœ„μ£Όμ˜ ν…μŠ€νŠΈλ‘œ μ‚¬μ „ν•™μŠ΅μ΄ μ§„ν–‰λ˜μ—ˆμŠ΅λ‹ˆλ‹€.

νŠΉν—ˆ λ¬Έμ„œ μž„λ² λ”© 계산, ν˜Ήμ€ νŠΉν—ˆ λ¬Έμ„œ λΆ„λ₯˜λ“±μ˜ νƒœμŠ€ν¬μ— ν™œμš©ν•  수 μžˆλŠ” ν•œκΈ€ μ–Έμ–΄λͺ¨λΈ (Language Model)μž…λ‹ˆλ‹€.

Patent Text Embedding 계산 μ˜ˆμ‹œ

patent_abstract = '''λ³Έ 발λͺ…은 νŠΉν—ˆ 검색 μ‹œμŠ€ν…œ 및 검색 방법에 κ΄€ν•œ κ²ƒμœΌλ‘œ, 보닀 μžμ„Έν•˜κ²ŒλŠ” μž…λ ₯ν•œ κ²€μƒ‰μ–΄μ˜ λ™μ˜μ–΄λ₯Ό 제곡, 검색어λ₯Ό μžλ™μœΌλ‘œ λ²ˆμ—­ν•˜μ—¬ ꡭ가에 상관없이 검색을 κ°€λŠ₯토둝 ν•˜κ±°λ‚˜ λŒ€λΆ„λ₯˜, 쀑뢄λ₯˜, μ†ŒλΆ„λ₯˜ λ“± λΆ„λ₯˜ν•œ 검색어λ₯Ό μ‘°ν•©ν•˜μ—¬ 검색을 ν–‰ν•¨μœΌλ‘œμ¨, 효율적인 μ„ ν–‰κΈ°μˆ μ„ 검색할 수 μžˆλ„λ‘ ν•˜λŠ” νŠΉν—ˆ 검색 μ‹œμŠ€ν…œ 및 검색 방법에 κ΄€ν•œ 것이닀.
νŠΉν—ˆ 검색, μœ μ‚¬λ„, ν‚€μ›Œλ“œ μΆ”μΆœ, 검색식 '''

tokenizer = AutoTokenizer.from_pretrained("LDKSolutions/KR-patent-deberta-large")

encoded_inputs = tokenizer(patent_abstract, max_length=512, truncation=True, padding="max_length", return_tensors="pt")

model = AutoModel.from_pretrained("LDKSolutions/KR-patent-deberta-large")

model.eval()

with torch.no_grad():
  outputs = model(**encoded_inputs)[0][:,0,:] # CLS-Pooling
  print(outputs.shape) # [1, 2048]
Downloads last month
10
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.