--- license: mit datasets: - WhereIsAI/github-issue-similarity language: - en library_name: sentence-transformers pipeline_tag: feature-extraction --- # WhereIsAI/UAE-Code-Large-V1 📢 `WhereIsAI/UAE-Code-Large-V1` **is licensed under MIT. Feel free to use it in any scenario.** If you use it for academic papers, we would greatly appreciate it if you could cite us. 👉 [citation info](#citation). This model builds upon [WhereIsAI/UAE-Large-V1](https://huggingface.co/WhereIsAI/UAE-Large-V1) and is fine-tuned on the [GIS: Github Issue Similarity](https://huggingface.co/datasets/WhereIsAI/github-issue-similarity) dataset using [AnglE](https://github.com/SeanLee97/AnglE) loss (https://arxiv.org/abs/2309.12871). It can be used to measure **code/issue similarity**. Results (test set): - Spearman correlation: 71.19 - Accuracy: 84.37 ## Usage ### 1. angle-emb You can use it via `angle-emb` as follows: install: ``` python -m pip install -U angle-emb ``` example: ```python from scipy import spatial from angle_emb import AnglE model = AnglE.from_pretrained('WhereIsAI/UAE-Code-Large-V1').cuda() quick_sort = '''# Approach 2: Quicksort using list comprehension def quicksort(arr): if len(arr) <= 1: return arr else: pivot = arr[0] left = [x for x in arr[1:] if x < pivot] right = [x for x in arr[1:] if x >= pivot] return quicksort(left) + [pivot] + quicksort(right) # Example usage arr = [1, 7, 4, 1, 10, 9, -2] sorted_arr = quicksort(arr) print("Sorted Array in Ascending Order:") print(sorted_arr)''' bubble_sort = '''def bubblesort(elements): # Looping from size of array from last index[-1] to index [0] for n in range(len(elements)-1, 0, -1): swapped = False for i in range(n): if elements[i] > elements[i + 1]: swapped = True # swapping data if the element is less than next element in the array elements[i], elements[i + 1] = elements[i + 1], elements[i] if not swapped: # exiting the function if we didn't make a single swap # meaning that the array is already sorted. return elements = [39, 12, 18, 85, 72, 10, 2, 18] print("Unsorted list is,") print(elements) bubblesort(elements) print("Sorted Array is, ") print(elements)''' vecs = model.encode([ 'def echo(): print("hello world")', quick_sort, bubble_sort ]) print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1])) print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2])) print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2])) ``` output: ``` cos sim (0, 1): 0.34329649806022644 cos sim (0, 2) 0.3627094626426697 cos sim (1, 2): 0.6972219347953796 ``` ## sentence-transformers You can also use it via `sentence-transformers` ```python from scipy import spatial from sentence_transformers import SentenceTransformer model = SentenceTransformer('WhereIsAI/UAE-Code-Large-V1').cuda() quick_sort = '''# Approach 2: Quicksort using list comprehension def quicksort(arr): if len(arr) <= 1: return arr else: pivot = arr[0] left = [x for x in arr[1:] if x < pivot] right = [x for x in arr[1:] if x >= pivot] return quicksort(left) + [pivot] + quicksort(right) # Example usage arr = [1, 7, 4, 1, 10, 9, -2] sorted_arr = quicksort(arr) print("Sorted Array in Ascending Order:") print(sorted_arr)''' bubble_sort = '''def bubblesort(elements): # Looping from size of array from last index[-1] to index [0] for n in range(len(elements)-1, 0, -1): swapped = False for i in range(n): if elements[i] > elements[i + 1]: swapped = True # swapping data if the element is less than next element in the array elements[i], elements[i + 1] = elements[i + 1], elements[i] if not swapped: # exiting the function if we didn't make a single swap # meaning that the array is already sorted. return elements = [39, 12, 18, 85, 72, 10, 2, 18] print("Unsorted list is,") print(elements) bubblesort(elements) print("Sorted Array is, ") print(elements)''' vecs = model.encode([ 'def echo(): print("hello world")', quick_sort, bubble_sort ]) print('cos sim (0, 1):', 1 - spatial.distance.cosine(vecs[0], vecs[1])) print('cos sim (0, 2)', 1 - spatial.distance.cosine(vecs[0], vecs[2])) print('cos sim (1, 2):', 1 - spatial.distance.cosine(vecs[1], vecs[2])) ``` output: ``` cos sim (0, 1): 0.34329649806022644 cos sim (0, 2) 0.3627094626426697 cos sim (1, 2): 0.6972219347953796 ``` # Citation ```bibtex @article{li2023angle, title={AnglE-optimized Text Embeddings}, author={Li, Xianming and Li, Jing}, journal={arXiv preprint arXiv:2309.12871}, year={2023} } ```