seongil-dn commited on
Commit
2ab58eb
1 Parent(s): fac4cce

Add new SentenceTransformer model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,1674 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:816532
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: Alibaba-NLP/gte-multilingual-base
10
+ widget:
11
+ - source_sentence: 김택용이 스타크래프트2에서 첫 승리를 거둔 시기는 언제인가?
12
+ sentences:
13
+ - 2008년 11월 22일, 김택용은 클럽데이 온라인 MSL 결승전에서 허영무에게 선승을 내준 후 내리 3연승, 3:1 쾌승을 거두며 자신의
14
+ 세 번째 MSL 우승을 달성하였다. 이를 통해 김택용은 프로토스 최초 개인리그 3회 우승자 및 역대 네 번째 금배지(MSL 3회 우승의 상징)
15
+ 획득자가 되었다.
16
+ - '김택용은 새로 개막한 SK플래닛 프로리그 시즌2에서 스타크래프트: 브루드 워, 스타크래프트 Ⅱ를 병행해서 출전했다. 스타크래프트 브루드워
17
+ 실력은 여전히 건재하지만, 스타Ⅱ에서는 스타크래프트 브루드워에서의 실력을 내지 못했다. 2012년 8월까지 택뱅리쌍 일원 중에서 김택용만 유일하게
18
+ 스타Ⅱ에서의 승리를 하지 못했다. (0승 6패) 더군다나 2012년 봄까지만 해도 스타Ⅱ를 완전히 이해하지 못한듯한 플레이를 보이고 있었지만,
19
+ 김택용은 2012년 여름이 되어서 스타Ⅱ를 서서히 실력을 쌓고 있었다. 기존의 스타크래프트 브루드워 스타리그가 스타크래프트 Ⅱ로 종목 전환한
20
+ 뒤에 열린 첫 예선에 참가했으나, 스타Ⅱ의 부족한 실력을 여실히 들어내면서 1:2로 신예선수에게 지며 예선탈락하였다. 또한 GSL 선수들과
21
+ 맞붙은 WCS 예선에서 프나틱의 장재호를 만나 무기력하게 0:2로 패배하여 탈락하였고, WCG 2012 예선에서도 백동준에게 0:2로 패배해
22
+ 스타Ⅱ 종목으로 열린 경기에서 모두 패배하였다. 김택용은 스타2리그 뿐만아니라 스타1리그에서도 2010년 여름부터 3년째 스타리그에 이름을
23
+ 올리지 못했다. 2012년 8월 12일 마침내 염보성을 상대로 어렵게 프로리그 스타2 종목에서 처음으로 승리를 거두었다(1승 6패). 결국
24
+ 부진을 극복하지 못한 채 2012년 8월 케스파 랭킹 22위로까지 떨어지고 말았다. 하지만 그 후 2012년 8월 18일 김정우 마저 김택용의
25
+ 스타2 승리 제물이 되었다. 엘리전까지 가는 혈전 끝에 스타Ⅱ에서 두각을 돋보이는 김정우를 격파하였고, 2012년 9월 2일 SK플래닛 스타
26
+ 프로리그 시즌2 준플레이오프 2차전에서 다시 한번 염보성을 스타Ⅱ로 격파하면서 조금씩 기세를 올렸다.'
27
+ - 이소룡의 아버지는 유명한 광둥 경극 배우였으며, 아버지의 뒤를 이어 아주 어린 나이부터 영화를 접하게 되었고, 생후 3개월에 《금문녀》라는
28
+ 영화로 데뷔하였다. 그가 18세가 되었을 때 이미 그는 스무 편의 영화에 출연한 상태였다.
29
+ - source_sentence: 페니스가 없는 여성의 심리적 반응은 어떠한가?
30
+ sentences:
31
+ - PIRA는 무장해제위원회(Decommingsioning Commission)에 의해 2005년 10월 무장투쟁을 포기했음을 확인받았으며, 우익
32
+ 민주연합당(DUP)를 제외한 정당들도 이를 인정했다. 단, DUP에서는 증거가 없다며 무장투쟁포기사실을 인정하지 않았는데, 이는 DUP가 PIRA를
33
+ 통해서 존재할 수 있기 때문이다. 그 실례로 북아일랜드의 수도 벨파스트에서 발행하는 일간지에선 PIRA 지도자 오닐이 무장투쟁을 포기하자,
34
+ 민주연합당 지도자 이언 페이즐리(Ian Paisley)가 "가지마! 난 네가 필요해!"라고 말하는 내용의 풍자만화를 실었다.
35
+ - 성적 만족을 위해서라면 정신적인 사랑 없이 육체적 결합이 가능하다고 주장하였다. 정분이 없이도 성교가 가능하며 성관계는 일종의 오락 내지는
36
+ 친밀행위에 지나지 않는다고 보았다. 그러나 이는 보수적인 유학자들 외에도 남성 지식인과 기독교계열의 반발을 불러왔다.
37
+ - 첫째는 "자신에게 페니스가 없는"것을 강하게 자각하고, 완전하게 페니스가 없는 존재로 받아들일 것이다. 이것은 열등감을 가진 여자를 만든다.
38
+ 이 경우 무기력한 인간이 되어버린다고 한다. 둘째는 "자신은 페니스가 언젠가 나오고, 나는 남자"라고 믿고, 남성적인 성격을 갖출 경우이다.
39
+ 세 번째는 성기라는 대상을 선망할 때 성기를 "페니스 → 아이"라는 상징으로 생각하고, 아이를 손에 넣는 길을 선택하는 경우이다.
40
+ - source_sentence: 신탁청은 언제 해체되었는가?
41
+ sentences:
42
+ - 신탁통치령(信託統治領, ) 혹은 신탁통치 지역(信託統治 地域)은 국제 연맹 위임통치령의 후신으로 제2차 세계 대전의 종전과 함께 국제 연맹이
43
+ 유엔으로 대체됨에 따라 생겨났다.다음 11개 지역이 신탁통치령이었다. 1994년 10월 팔라우 독립을 마지막으로 신탁통치령은 소멸되었다.
44
+ - 히가시코게 역()은 일본 돗토리현 야즈 군 야즈 정에 위치한 서일본 여객철도 인비 선의 철도역이다. 단선 승강장 1면 1선의 구조를 갖춘 지상역이다.
45
+ - 신탁청은 1994년 12월 31일 해체될 때까지 15,102개의 기업체를 매각하고 4358개의 기업체를 재사유화했으며, 호텔, 식당, 약국
46
+ 및 서점 등 소규모 사업장 25,030개를 사유화하고 46,552건의 부동산을 매각해 총 91,042건의 사유화를 기록했다. 이를 통해 666억
47
+ 마르크의 매각수익을 올리고, 2111억 마르크의 투자와 150만 개의 일자리를 보장받았다. 초기에 추산되었던 기업가치가 약 6000억 마르크였던
48
+ 것에 비하면 1/10 수준밖에 되지 않은 턱없이 낮은 매각수익이다. 사유화된 15,000여 기업 중 구동독인들에 의한 매입은― 주로 경영자기업인수(MBO)
49
+ 혹은 종업원기업인수(EBO) ― 6%에 지나지않았고, 외국인 투자자 매입도 사유화 전체 기업 중 9% 정도로 나타났다.
50
+ - source_sentence: 석신산의 탈수 반응 생성물은 무엇인가요?
51
+ sentences:
52
+ - 석신산은 푸마르산으로 산화되거나 다이에틸석시네이트(diethylsuccinate, (CHCOCHCH))와 같은 다이에스터로 전환될 수 있다.
53
+ 이러한 다이에틸 에스터(diethyl ester)는 스토브 축합(Stobbe condensation) 반응의 기질이다. 석신산의 탈수는 석신산
54
+ 무수물을 생성한다. 석신산은 1,4-뷰테인다이올, 말레산 무수물, 석신이미드, 2-피롤리디논 및 테트라하이드로푸란을 유도하는데 사용될 수 있다.
55
+ - 2006년 ‘동의대 5·3 동지회’ 회원 등은 “동의대 사건 이후 경찰 조사 과정에서 고문 등 인권침해가 있었다”며 진실·화해를 위한 과거사
56
+ 정리 위원회(이하 진실화해위)에 진실규명을 신청하였다. 이로 인해 진실화해위 소위원회는 “구타 등 인권침해가 있어 국가가 사과해야 한다”는
57
+ 내용의 조사 결과 보고서를 심의·의결, 2010년 1월 19일에 열린 진실화해위 전원위원회에 상정했으나, “진실화해위는 ‘권위주의 통치’ 시기에
58
+ 일어난 일을 조사 대상으로 삼는데, 동의대 사건은 노태우 정권 시절에 일어난 일이므로 조사 대상 자체가 되지 않는다”며 재적위원 과반수가 이
59
+ 사건을 각하하기로 의결해 사건이 각하되었다. 다음날인 1월 20일에는 조사하지 않기로 했다고 밝힘으로서, 보고서 내용은 논의조차 되지 못한
60
+ 것으로 전해졌다.
61
+ - 저산소 상태에서 석신산의 축적은 활성 산소 생산의 증가에 의한 허혈 재관류 손상(reperfusion injury)과 관련이 있다. 허혈(ischemia)
62
+ 동안 푸마르산은 퓨린 뉴클레오타이드의 분해 및 말산-아스파르트산 셔틀의 역방향 반응의 일부분으로부터 형성된다. 과도한 푸마르산은 석신산 탈수소효소의
63
+ 역반응을 통해 석신산의 생산 및 축적을 야기한다. 재관류시 석신산은 신속하게 산화되어 활성산소의 갑작스럽고 광범위한 생성을 초래한다. 활성산소는
64
+ 세포자살 기작을 촉발시키거나 단백질, 세포막, 세포소기관 등에 산화적 손상을 유발한다. 동물 모델에서 허혈성 석신산 축적의 약리학적 억제는
65
+ 허혈 재관류 손상을 개선시켰다. 현재 석신산 매개 활성산소 생성의 억제는 약물 치료의 표적으로 조사 중이다.
66
+ - source_sentence: 파올로 말디니는 어떤 선수인가요?
67
+ sentences:
68
+ - 체사레 말디니는 1954년부터 1966년까지 AC 밀란에서 뛰었고, 아들 파올로 말디니는 1985년부터 2009년까지 AC 밀란에서 뛰었으며,
69
+ 손자 크리스티안 말디니가 2005년 10월 18일 AC 밀란 유스팀에 입단해 3부자가 모두 AC 밀란에서 활약하게 되었다.
70
+ - 파올로 체사레 말디니 (, 1968년 6월 26일, 이탈리아 밀라노 ~ )는 이탈리아의 은퇴한 축구 선수로, 포지션은 왼쪽 풀백과 센터백이었다.
71
+ 그는 밀란의 전설적인 수비수 였을 뿐 아니라 역대 최고 수비수로도 불릴 만큼 대단한 선수였다. 현재 밀란의 스포츠 전략 & 개발 디렉터로 활동하고
72
+ 있다.
73
+ - 조 주니어(Joe Junior, 본명은 Jose Maria Rodrigues, Jr.(조즈 마리아 로드리게스 주니어), 중문명(中文名)은 羅利期(뤄리지,
74
+ 나이기), 1947년 7월 22일 ~ )는 영국 국적자 신분의 포르투갈계 영국인 남성으로 중화인민공화국 마카오 특별행정구에서 출생한 중화인민공화국
75
+ 홍콩 특별행정구의 가수, 작사가, 영화배우, 텔레비전 연기자이다.
76
+ pipeline_tag: sentence-similarity
77
+ library_name: sentence-transformers
78
+ ---
79
+
80
+ # SentenceTransformer based on Alibaba-NLP/gte-multilingual-base
81
+
82
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
83
+
84
+ ## Model Details
85
+
86
+ ### Model Description
87
+ - **Model Type:** Sentence Transformer
88
+ - **Base model:** [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base) <!-- at revision 3f013725dc4dcee1e4ca72d1ce7e053c0dcee5ef -->
89
+ - **Maximum Sequence Length:** 1024 tokens
90
+ - **Output Dimensionality:** 768 tokens
91
+ - **Similarity Function:** Cosine Similarity
92
+ <!-- - **Training Dataset:** Unknown -->
93
+ <!-- - **Language:** Unknown -->
94
+ <!-- - **License:** Unknown -->
95
+
96
+ ### Model Sources
97
+
98
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
99
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
100
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
101
+
102
+ ### Full Model Architecture
103
+
104
+ ```
105
+ SentenceTransformer(
106
+ (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: NewModel
107
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
108
+ (2): Normalize()
109
+ )
110
+ ```
111
+
112
+ ## Usage
113
+
114
+ ### Direct Usage (Sentence Transformers)
115
+
116
+ First install the Sentence Transformers library:
117
+
118
+ ```bash
119
+ pip install -U sentence-transformers
120
+ ```
121
+
122
+ Then you can load this model and run inference.
123
+ ```python
124
+ from sentence_transformers import SentenceTransformer
125
+
126
+ # Download from the 🤗 Hub
127
+ model = SentenceTransformer("seongil-dn/gte-base-250k-answerableHN")
128
+ # Run inference
129
+ sentences = [
130
+ '파올로 말디니는 어떤 선수인가요?',
131
+ '파올로 체사레 말디니 (, 1968년 6월 26일, 이탈리아 밀라노 ~ )는 이탈리아의 은퇴한 축구 선수로, 포지션은 왼쪽 풀백과 센터백이었다. 그는 밀란의 전설적인 수비수 였을 뿐 아니라 역대 최고 수비수로도 불릴 만큼 대단한 선수였다. 현재 밀란의 스포츠 전략 & 개발 디렉터로 활동하고 있다.',
132
+ '체사레 말디니는 1954년부터 1966년까지 AC 밀란에서 뛰었고, 아들 파올로 말디니는 1985년부터 2009년까지 AC 밀란에서 뛰었으며, 손자 크리스티안 말디니가 2005년 10월 18일 AC 밀란 유스팀에 입단해 3부자가 모두 AC 밀란에서 활약하게 되었다.',
133
+ ]
134
+ embeddings = model.encode(sentences)
135
+ print(embeddings.shape)
136
+ # [3, 768]
137
+
138
+ # Get the similarity scores for the embeddings
139
+ similarities = model.similarity(embeddings, embeddings)
140
+ print(similarities.shape)
141
+ # [3, 3]
142
+ ```
143
+
144
+ <!--
145
+ ### Direct Usage (Transformers)
146
+
147
+ <details><summary>Click to see the direct usage in Transformers</summary>
148
+
149
+ </details>
150
+ -->
151
+
152
+ <!--
153
+ ### Downstream Usage (Sentence Transformers)
154
+
155
+ You can finetune this model on your own dataset.
156
+
157
+ <details><summary>Click to expand</summary>
158
+
159
+ </details>
160
+ -->
161
+
162
+ <!--
163
+ ### Out-of-Scope Use
164
+
165
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
166
+ -->
167
+
168
+ <!--
169
+ ## Bias, Risks and Limitations
170
+
171
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
172
+ -->
173
+
174
+ <!--
175
+ ### Recommendations
176
+
177
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
178
+ -->
179
+
180
+ ## Training Details
181
+
182
+ ### Training Dataset
183
+
184
+ #### Unnamed Dataset
185
+
186
+
187
+ * Size: 816,532 training samples
188
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
189
+ * Approximate statistics based on the first 1000 samples:
190
+ | | anchor | positive | negative |
191
+ |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------|
192
+ | type | string | string | string |
193
+ | details | <ul><li>min: 9 tokens</li><li>mean: 17.22 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 46 tokens</li><li>mean: 144.47 tokens</li><li>max: 621 tokens</li></ul> | <ul><li>min: 46 tokens</li><li>mean: 169.92 tokens</li><li>max: 1024 tokens</li></ul> |
194
+ * Samples:
195
+ | anchor | positive | negative |
196
+ |:-------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
197
+ | <code>별의 나이는 어떻게 측정하는가?</code> | <code>별의 나이는 토륨과 다른 성분들에 의해 만들어진 스펙트럼선들의 상대적인 힘을 측정하기 위해 초거대망원경의 자외선 분광기를 사용하여 추측한다. 선의 힘은 여러 가지 다양한 동위원소를 만들어내는데, 그것들로부터 핵우주 연대학을 사용하여 별의 나이를 짐작하는 것이다.</code> | <code>아들이 아버지보다 나이가 많을 수 없는 것처럼, 우주 안의 천체는 당연히 우주보다는 젊어야 하기 때문에, 여러 종류의 천체를 관측하여 그 나이를 추정하는 것으로 우주의 나이의 하한선을 얻을 수 있다. 가장 많이 쓰이는 방법 중 하나는 가장 온도가 낮은 백색왜성의 나이를 측정하는 것이다. 백색왜성은 태양과 비슷한 질량을 가진 별들이 죽으면서 만들어지는데, 백색왜성은 당시 가지고 있던 열 이외에 다른 에너지원이 없기 때문에 나이가 들면서 점점 식고, 어두워지게 된다. 따라서 가장 어둡고, 가장 온도가 낮은 백색왜성을 찾아서 그 냉각 나이를 측정하면 우주의 나이의 하한선을 얻을 수 있다.</code> |
198
+ | <code>별의 나이는 어떻게 측정하는가?</code> | <code>별의 나이는 토륨과 다른 성분들에 의해 만들어진 스펙트럼선들의 상대적인 힘을 측정하기 위해 초거대망원경의 자외선 분광기를 사용하여 추측한다. 선의 힘은 여러 가지 다양한 동위원소를 만들어내는데, 그것들로부터 핵우주 연대학을 사용하여 별의 나이를 짐작하는 것이다.</code> | <code>이 별의 물리적 수치는 태양과 비슷한데 분광형이 태양과 똑같은 G2V 여서 유사 태양으로 분류할 수 있다. 질량은 태양보다 9 퍼센트 무겁고 반지름은 태양보다 1 퍼센트 작다. 나이는 상대적으로 젊어 약 8천만 ~ 2억 년으로 보인다. 젊은 별인만큼 자전 속도는 3.5일에 한 번 돌 정도로 빠르며 자전축은 시선방향에 대해 21도(오차범위 +8, -9도) 기울어져 있다.</code> |
199
+ | <code>별의 나이는 어떻게 측정하는가?</code> | <code>별의 나이는 토륨과 다른 성분들에 의해 만들어진 스펙트럼선들의 상대적인 힘을 측정하기 위해 초거대망원경의 자외선 분광기를 사용하여 추측한다. 선의 힘은 여러 가지 다양한 동위원소를 만들어내는데, 그것들로부터 핵우주 연대학을 사용하여 별의 나이를 짐작하는 것이다.</code> | <code>여기서 "v"는 적도에서의 각속도이며 "t"는 별의 나이이다. 이 관계식은 1972년 앤드류 P. 스쿠마니치가 발견했으며 그의 이름을 따서 '스쿠마니치의 법칙'으로 불린다. 자이로연대학(Gyrochronology)은 태양의 속도를 기준점으로 한 항성의 자전 속도에 기초하여, 그 별의 나이를 결정하는 것이다.</code> |
200
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
201
+ ```json
202
+ {
203
+ "scale": 20.0,
204
+ "similarity_fct": "cos_sim"
205
+ }
206
+ ```
207
+
208
+ ### Training Hyperparameters
209
+ #### Non-Default Hyperparameters
210
+
211
+ - `per_device_train_batch_size`: 40
212
+ - `gradient_accumulation_steps`: 4
213
+ - `learning_rate`: 0.0001
214
+ - `adam_epsilon`: 1e-07
215
+ - `num_train_epochs`: 1
216
+ - `warmup_ratio`: 0.1
217
+ - `bf16`: True
218
+ - `batch_sampler`: no_duplicates
219
+
220
+ #### All Hyperparameters
221
+ <details><summary>Click to expand</summary>
222
+
223
+ - `overwrite_output_dir`: False
224
+ - `do_predict`: False
225
+ - `eval_strategy`: no
226
+ - `prediction_loss_only`: True
227
+ - `per_device_train_batch_size`: 40
228
+ - `per_device_eval_batch_size`: 8
229
+ - `per_gpu_train_batch_size`: None
230
+ - `per_gpu_eval_batch_size`: None
231
+ - `gradient_accumulation_steps`: 4
232
+ - `eval_accumulation_steps`: None
233
+ - `torch_empty_cache_steps`: None
234
+ - `learning_rate`: 0.0001
235
+ - `weight_decay`: 0.0
236
+ - `adam_beta1`: 0.9
237
+ - `adam_beta2`: 0.999
238
+ - `adam_epsilon`: 1e-07
239
+ - `max_grad_norm`: 1.0
240
+ - `num_train_epochs`: 1
241
+ - `max_steps`: -1
242
+ - `lr_scheduler_type`: linear
243
+ - `lr_scheduler_kwargs`: {}
244
+ - `warmup_ratio`: 0.1
245
+ - `warmup_steps`: 0
246
+ - `log_level`: passive
247
+ - `log_level_replica`: warning
248
+ - `log_on_each_node`: True
249
+ - `logging_nan_inf_filter`: True
250
+ - `save_safetensors`: True
251
+ - `save_on_each_node`: False
252
+ - `save_only_model`: False
253
+ - `restore_callback_states_from_checkpoint`: False
254
+ - `no_cuda`: False
255
+ - `use_cpu`: False
256
+ - `use_mps_device`: False
257
+ - `seed`: 42
258
+ - `data_seed`: None
259
+ - `jit_mode_eval`: False
260
+ - `use_ipex`: False
261
+ - `bf16`: True
262
+ - `fp16`: False
263
+ - `fp16_opt_level`: O1
264
+ - `half_precision_backend`: auto
265
+ - `bf16_full_eval`: False
266
+ - `fp16_full_eval`: False
267
+ - `tf32`: None
268
+ - `local_rank`: 0
269
+ - `ddp_backend`: None
270
+ - `tpu_num_cores`: None
271
+ - `tpu_metrics_debug`: False
272
+ - `debug`: []
273
+ - `dataloader_drop_last`: True
274
+ - `dataloader_num_workers`: 0
275
+ - `dataloader_prefetch_factor`: None
276
+ - `past_index`: -1
277
+ - `disable_tqdm`: False
278
+ - `remove_unused_columns`: True
279
+ - `label_names`: None
280
+ - `load_best_model_at_end`: False
281
+ - `ignore_data_skip`: False
282
+ - `fsdp`: []
283
+ - `fsdp_min_num_params`: 0
284
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
285
+ - `fsdp_transformer_layer_cls_to_wrap`: None
286
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
287
+ - `deepspeed`: None
288
+ - `label_smoothing_factor`: 0.0
289
+ - `optim`: adamw_torch
290
+ - `optim_args`: None
291
+ - `adafactor`: False
292
+ - `group_by_length`: False
293
+ - `length_column_name`: length
294
+ - `ddp_find_unused_parameters`: None
295
+ - `ddp_bucket_cap_mb`: None
296
+ - `ddp_broadcast_buffers`: False
297
+ - `dataloader_pin_memory`: True
298
+ - `dataloader_persistent_workers`: False
299
+ - `skip_memory_metrics`: True
300
+ - `use_legacy_prediction_loop`: False
301
+ - `push_to_hub`: False
302
+ - `resume_from_checkpoint`: None
303
+ - `hub_model_id`: None
304
+ - `hub_strategy`: every_save
305
+ - `hub_private_repo`: False
306
+ - `hub_always_push`: False
307
+ - `gradient_checkpointing`: False
308
+ - `gradient_checkpointing_kwargs`: None
309
+ - `include_inputs_for_metrics`: False
310
+ - `eval_do_concat_batches`: True
311
+ - `fp16_backend`: auto
312
+ - `push_to_hub_model_id`: None
313
+ - `push_to_hub_organization`: None
314
+ - `mp_parameters`:
315
+ - `auto_find_batch_size`: False
316
+ - `full_determinism`: False
317
+ - `torchdynamo`: None
318
+ - `ray_scope`: last
319
+ - `ddp_timeout`: 1800
320
+ - `torch_compile`: False
321
+ - `torch_compile_backend`: None
322
+ - `torch_compile_mode`: None
323
+ - `dispatch_batches`: None
324
+ - `split_batches`: None
325
+ - `include_tokens_per_second`: False
326
+ - `include_num_input_tokens_seen`: False
327
+ - `neftune_noise_alpha`: None
328
+ - `optim_target_modules`: None
329
+ - `batch_eval_metrics`: False
330
+ - `eval_on_start`: False
331
+ - `eval_use_gather_object`: False
332
+ - `batch_sampler`: no_duplicates
333
+ - `multi_dataset_batch_sampler`: proportional
334
+
335
+ </details>
336
+
337
+ ### Training Logs
338
+ <details><summary>Click to expand</summary>
339
+
340
+ | Epoch | Step | Training Loss |
341
+ |:------:|:----:|:-------------:|
342
+ | 0.0008 | 1 | 0.4813 |
343
+ | 0.0016 | 2 | 0.5643 |
344
+ | 0.0024 | 3 | 0.4872 |
345
+ | 0.0031 | 4 | 0.3838 |
346
+ | 0.0039 | 5 | 0.4269 |
347
+ | 0.0047 | 6 | 0.434 |
348
+ | 0.0055 | 7 | 0.5153 |
349
+ | 0.0063 | 8 | 0.4429 |
350
+ | 0.0071 | 9 | 0.4464 |
351
+ | 0.0078 | 10 | 0.4187 |
352
+ | 0.0086 | 11 | 0.468 |
353
+ | 0.0094 | 12 | 0.402 |
354
+ | 0.0102 | 13 | 0.3745 |
355
+ | 0.0110 | 14 | 0.3623 |
356
+ | 0.0118 | 15 | 0.3358 |
357
+ | 0.0125 | 16 | 0.3927 |
358
+ | 0.0133 | 17 | 0.4539 |
359
+ | 0.0141 | 18 | 0.3177 |
360
+ | 0.0149 | 19 | 0.2902 |
361
+ | 0.0157 | 20 | 0.3559 |
362
+ | 0.0165 | 21 | 0.2641 |
363
+ | 0.0172 | 22 | 0.2968 |
364
+ | 0.0180 | 23 | 0.2008 |
365
+ | 0.0188 | 24 | 0.2742 |
366
+ | 0.0196 | 25 | 0.3565 |
367
+ | 0.0204 | 26 | 0.2706 |
368
+ | 0.0212 | 27 | 0.2544 |
369
+ | 0.0219 | 28 | 0.2721 |
370
+ | 0.0227 | 29 | 0.2795 |
371
+ | 0.0235 | 30 | 0.2647 |
372
+ | 0.0243 | 31 | 0.164 |
373
+ | 0.0251 | 32 | 0.2574 |
374
+ | 0.0259 | 33 | 0.1962 |
375
+ | 0.0267 | 34 | 0.2739 |
376
+ | 0.0274 | 35 | 0.2286 |
377
+ | 0.0282 | 36 | 0.2376 |
378
+ | 0.0290 | 37 | 0.3125 |
379
+ | 0.0298 | 38 | 0.2401 |
380
+ | 0.0306 | 39 | 0.1922 |
381
+ | 0.0314 | 40 | 0.2479 |
382
+ | 0.0321 | 41 | 0.1851 |
383
+ | 0.0329 | 42 | 0.1813 |
384
+ | 0.0337 | 43 | 0.2471 |
385
+ | 0.0345 | 44 | 0.2561 |
386
+ | 0.0353 | 45 | 0.2568 |
387
+ | 0.0361 | 46 | 0.3049 |
388
+ | 0.0368 | 47 | 0.2404 |
389
+ | 0.0376 | 48 | 0.231 |
390
+ | 0.0384 | 49 | 0.261 |
391
+ | 0.0392 | 50 | 0.2581 |
392
+ | 0.0400 | 51 | 0.2184 |
393
+ | 0.0408 | 52 | 0.2002 |
394
+ | 0.0415 | 53 | 0.2586 |
395
+ | 0.0423 | 54 | 0.1532 |
396
+ | 0.0431 | 55 | 0.2023 |
397
+ | 0.0439 | 56 | 0.2272 |
398
+ | 0.0447 | 57 | 0.2207 |
399
+ | 0.0455 | 58 | 0.2364 |
400
+ | 0.0462 | 59 | 0.2044 |
401
+ | 0.0470 | 60 | 0.2387 |
402
+ | 0.0478 | 61 | 0.2289 |
403
+ | 0.0486 | 62 | 0.1616 |
404
+ | 0.0494 | 63 | 0.1753 |
405
+ | 0.0502 | 64 | 0.1803 |
406
+ | 0.0510 | 65 | 0.2033 |
407
+ | 0.0517 | 66 | 0.2061 |
408
+ | 0.0525 | 67 | 0.2128 |
409
+ | 0.0533 | 68 | 0.2046 |
410
+ | 0.0541 | 69 | 0.1685 |
411
+ | 0.0549 | 70 | 0.1985 |
412
+ | 0.0557 | 71 | 0.1713 |
413
+ | 0.0564 | 72 | 0.21 |
414
+ | 0.0572 | 73 | 0.2085 |
415
+ | 0.0580 | 74 | 0.2144 |
416
+ | 0.0588 | 75 | 0.2099 |
417
+ | 0.0596 | 76 | 0.223 |
418
+ | 0.0604 | 77 | 0.2342 |
419
+ | 0.0611 | 78 | 0.2327 |
420
+ | 0.0619 | 79 | 0.1812 |
421
+ | 0.0627 | 80 | 0.2068 |
422
+ | 0.0635 | 81 | 0.1826 |
423
+ | 0.0643 | 82 | 0.1792 |
424
+ | 0.0651 | 83 | 0.2363 |
425
+ | 0.0658 | 84 | 0.1842 |
426
+ | 0.0666 | 85 | 0.1673 |
427
+ | 0.0674 | 86 | 0.2068 |
428
+ | 0.0682 | 87 | 0.2386 |
429
+ | 0.0690 | 88 | 0.1905 |
430
+ | 0.0698 | 89 | 0.22 |
431
+ | 0.0705 | 90 | 0.2351 |
432
+ | 0.0713 | 91 | 0.2444 |
433
+ | 0.0721 | 92 | 0.1984 |
434
+ | 0.0729 | 93 | 0.1823 |
435
+ | 0.0737 | 94 | 0.201 |
436
+ | 0.0745 | 95 | 0.1548 |
437
+ | 0.0752 | 96 | 0.1824 |
438
+ | 0.0760 | 97 | 0.2315 |
439
+ | 0.0768 | 98 | 0.2042 |
440
+ | 0.0776 | 99 | 0.1579 |
441
+ | 0.0784 | 100 | 0.1906 |
442
+ | 0.0792 | 101 | 0.2058 |
443
+ | 0.0800 | 102 | 0.2094 |
444
+ | 0.0807 | 103 | 0.2149 |
445
+ | 0.0815 | 104 | 0.2138 |
446
+ | 0.0823 | 105 | 0.1932 |
447
+ | 0.0831 | 106 | 0.1874 |
448
+ | 0.0839 | 107 | 0.1945 |
449
+ | 0.0847 | 108 | 0.1705 |
450
+ | 0.0854 | 109 | 0.1832 |
451
+ | 0.0862 | 110 | 0.2075 |
452
+ | 0.0870 | 111 | 0.1586 |
453
+ | 0.0878 | 112 | 0.139 |
454
+ | 0.0886 | 113 | 0.1496 |
455
+ | 0.0894 | 114 | 0.1843 |
456
+ | 0.0901 | 115 | 0.2377 |
457
+ | 0.0909 | 116 | 0.1998 |
458
+ | 0.0917 | 117 | 0.1491 |
459
+ | 0.0925 | 118 | 0.1763 |
460
+ | 0.0933 | 119 | 0.128 |
461
+ | 0.0941 | 120 | 0.1595 |
462
+ | 0.0948 | 121 | 0.1816 |
463
+ | 0.0956 | 122 | 0.2252 |
464
+ | 0.0964 | 123 | 0.1829 |
465
+ | 0.0972 | 124 | 0.1505 |
466
+ | 0.0980 | 125 | 0.1726 |
467
+ | 0.0988 | 126 | 0.2009 |
468
+ | 0.0995 | 127 | 0.2219 |
469
+ | 0.1003 | 128 | 0.1384 |
470
+ | 0.1011 | 129 | 0.1243 |
471
+ | 0.1019 | 130 | 0.2139 |
472
+ | 0.1027 | 131 | 0.1677 |
473
+ | 0.1035 | 132 | 0.1957 |
474
+ | 0.1043 | 133 | 0.1683 |
475
+ | 0.1050 | 134 | 0.168 |
476
+ | 0.1058 | 135 | 0.2021 |
477
+ | 0.1066 | 136 | 0.2112 |
478
+ | 0.1074 | 137 | 0.2093 |
479
+ | 0.1082 | 138 | 0.2279 |
480
+ | 0.1090 | 139 | 0.2001 |
481
+ | 0.1097 | 140 | 0.179 |
482
+ | 0.1105 | 141 | 0.1954 |
483
+ | 0.1113 | 142 | 0.172 |
484
+ | 0.1121 | 143 | 0.1969 |
485
+ | 0.1129 | 144 | 0.1561 |
486
+ | 0.1137 | 145 | 0.1802 |
487
+ | 0.1144 | 146 | 0.1885 |
488
+ | 0.1152 | 147 | 0.1438 |
489
+ | 0.1160 | 148 | 0.1791 |
490
+ | 0.1168 | 149 | 0.1905 |
491
+ | 0.1176 | 150 | 0.2506 |
492
+ | 0.1184 | 151 | 0.2024 |
493
+ | 0.1191 | 152 | 0.2059 |
494
+ | 0.1199 | 153 | 0.2393 |
495
+ | 0.1207 | 154 | 0.1531 |
496
+ | 0.1215 | 155 | 0.1888 |
497
+ | 0.1223 | 156 | 0.1831 |
498
+ | 0.1231 | 157 | 0.1378 |
499
+ | 0.1238 | 158 | 0.1553 |
500
+ | 0.1246 | 159 | 0.2004 |
501
+ | 0.1254 | 160 | 0.2071 |
502
+ | 0.1262 | 161 | 0.1909 |
503
+ | 0.1270 | 162 | 0.1763 |
504
+ | 0.1278 | 163 | 0.1914 |
505
+ | 0.1286 | 164 | 0.1365 |
506
+ | 0.1293 | 165 | 0.2272 |
507
+ | 0.1301 | 166 | 0.1484 |
508
+ | 0.1309 | 167 | 0.2181 |
509
+ | 0.1317 | 168 | 0.2386 |
510
+ | 0.1325 | 169 | 0.2005 |
511
+ | 0.1333 | 170 | 0.1757 |
512
+ | 0.1340 | 171 | 0.1679 |
513
+ | 0.1348 | 172 | 0.1707 |
514
+ | 0.1356 | 173 | 0.1448 |
515
+ | 0.1364 | 174 | 0.1703 |
516
+ | 0.1372 | 175 | 0.1739 |
517
+ | 0.1380 | 176 | 0.1376 |
518
+ | 0.1387 | 177 | 0.1906 |
519
+ | 0.1395 | 178 | 0.2542 |
520
+ | 0.1403 | 179 | 0.1438 |
521
+ | 0.1411 | 180 | 0.1786 |
522
+ | 0.1419 | 181 | 0.1838 |
523
+ | 0.1427 | 182 | 0.1592 |
524
+ | 0.1434 | 183 | 0.1991 |
525
+ | 0.1442 | 184 | 0.1702 |
526
+ | 0.1450 | 185 | 0.1787 |
527
+ | 0.1458 | 186 | 0.1631 |
528
+ | 0.1466 | 187 | 0.2697 |
529
+ | 0.1474 | 188 | 0.1654 |
530
+ | 0.1481 | 189 | 0.2037 |
531
+ | 0.1489 | 190 | 0.1751 |
532
+ | 0.1497 | 191 | 0.212 |
533
+ | 0.1505 | 192 | 0.1531 |
534
+ | 0.1513 | 193 | 0.1802 |
535
+ | 0.1521 | 194 | 0.1421 |
536
+ | 0.1529 | 195 | 0.236 |
537
+ | 0.1536 | 196 | 0.1702 |
538
+ | 0.1544 | 197 | 0.1869 |
539
+ | 0.1552 | 198 | 0.1796 |
540
+ | 0.1560 | 199 | 0.1537 |
541
+ | 0.1568 | 200 | 0.1646 |
542
+ | 0.1576 | 201 | 0.1603 |
543
+ | 0.1583 | 202 | 0.1662 |
544
+ | 0.1591 | 203 | 0.1323 |
545
+ | 0.1599 | 204 | 0.1672 |
546
+ | 0.1607 | 205 | 0.2217 |
547
+ | 0.1615 | 206 | 0.144 |
548
+ | 0.1623 | 207 | 0.1889 |
549
+ | 0.1630 | 208 | 0.159 |
550
+ | 0.1638 | 209 | 0.1298 |
551
+ | 0.1646 | 210 | 0.1245 |
552
+ | 0.1654 | 211 | 0.1815 |
553
+ | 0.1662 | 212 | 0.1771 |
554
+ | 0.1670 | 213 | 0.1441 |
555
+ | 0.1677 | 214 | 0.1834 |
556
+ | 0.1685 | 215 | 0.1997 |
557
+ | 0.1693 | 216 | 0.203 |
558
+ | 0.1701 | 217 | 0.1986 |
559
+ | 0.1709 | 218 | 0.1965 |
560
+ | 0.1717 | 219 | 0.1682 |
561
+ | 0.1724 | 220 | 0.1485 |
562
+ | 0.1732 | 221 | 0.1531 |
563
+ | 0.1740 | 222 | 0.16 |
564
+ | 0.1748 | 223 | 0.1554 |
565
+ | 0.1756 | 224 | 0.1705 |
566
+ | 0.1764 | 225 | 0.1771 |
567
+ | 0.1772 | 226 | 0.1507 |
568
+ | 0.1779 | 227 | 0.1623 |
569
+ | 0.1787 | 228 | 0.1527 |
570
+ | 0.1795 | 229 | 0.1332 |
571
+ | 0.1803 | 230 | 0.1556 |
572
+ | 0.1811 | 231 | 0.1504 |
573
+ | 0.1819 | 232 | 0.1581 |
574
+ | 0.1826 | 233 | 0.15 |
575
+ | 0.1834 | 234 | 0.2012 |
576
+ | 0.1842 | 235 | 0.1587 |
577
+ | 0.1850 | 236 | 0.2141 |
578
+ | 0.1858 | 237 | 0.1431 |
579
+ | 0.1866 | 238 | 0.1092 |
580
+ | 0.1873 | 239 | 0.1688 |
581
+ | 0.1881 | 240 | 0.2185 |
582
+ | 0.1889 | 241 | 0.2071 |
583
+ | 0.1897 | 242 | 0.1575 |
584
+ | 0.1905 | 243 | 0.1251 |
585
+ | 0.1913 | 244 | 0.1692 |
586
+ | 0.1920 | 245 | 0.1746 |
587
+ | 0.1928 | 246 | 0.2024 |
588
+ | 0.1936 | 247 | 0.2074 |
589
+ | 0.1944 | 248 | 0.2422 |
590
+ | 0.1952 | 249 | 0.1994 |
591
+ | 0.1960 | 250 | 0.1672 |
592
+ | 0.1967 | 251 | 0.1474 |
593
+ | 0.1975 | 252 | 0.1888 |
594
+ | 0.1983 | 253 | 0.2173 |
595
+ | 0.1991 | 254 | 0.1448 |
596
+ | 0.1999 | 255 | 0.2403 |
597
+ | 0.2007 | 256 | 0.1652 |
598
+ | 0.2015 | 257 | 0.1929 |
599
+ | 0.2022 | 258 | 0.1272 |
600
+ | 0.2030 | 259 | 0.193 |
601
+ | 0.2038 | 260 | 0.1665 |
602
+ | 0.2046 | 261 | 0.1677 |
603
+ | 0.2054 | 262 | 0.1558 |
604
+ | 0.2062 | 263 | 0.1825 |
605
+ | 0.2069 | 264 | 0.1549 |
606
+ | 0.2077 | 265 | 0.199 |
607
+ | 0.2085 | 266 | 0.1495 |
608
+ | 0.2093 | 267 | 0.1478 |
609
+ | 0.2101 | 268 | 0.168 |
610
+ | 0.2109 | 269 | 0.1015 |
611
+ | 0.2116 | 270 | 0.1924 |
612
+ | 0.2124 | 271 | 0.1397 |
613
+ | 0.2132 | 272 | 0.1449 |
614
+ | 0.2140 | 273 | 0.1797 |
615
+ | 0.2148 | 274 | 0.1689 |
616
+ | 0.2156 | 275 | 0.1738 |
617
+ | 0.2163 | 276 | 0.1758 |
618
+ | 0.2171 | 277 | 0.1298 |
619
+ | 0.2179 | 278 | 0.1889 |
620
+ | 0.2187 | 279 | 0.1377 |
621
+ | 0.2195 | 280 | 0.1592 |
622
+ | 0.2203 | 281 | 0.1506 |
623
+ | 0.2210 | 282 | 0.1622 |
624
+ | 0.2218 | 283 | 0.1484 |
625
+ | 0.2226 | 284 | 0.1493 |
626
+ | 0.2234 | 285 | 0.1305 |
627
+ | 0.2242 | 286 | 0.1131 |
628
+ | 0.2250 | 287 | 0.1466 |
629
+ | 0.2257 | 288 | 0.1267 |
630
+ | 0.2265 | 289 | 0.1426 |
631
+ | 0.2273 | 290 | 0.1649 |
632
+ | 0.2281 | 291 | 0.1263 |
633
+ | 0.2289 | 292 | 0.2029 |
634
+ | 0.2297 | 293 | 0.1845 |
635
+ | 0.2305 | 294 | 0.1364 |
636
+ | 0.2312 | 295 | 0.1688 |
637
+ | 0.2320 | 296 | 0.2093 |
638
+ | 0.2328 | 297 | 0.1605 |
639
+ | 0.2336 | 298 | 0.1206 |
640
+ | 0.2344 | 299 | 0.2165 |
641
+ | 0.2352 | 300 | 0.2139 |
642
+ | 0.2359 | 301 | 0.1673 |
643
+ | 0.2367 | 302 | 0.1455 |
644
+ | 0.2375 | 303 | 0.1617 |
645
+ | 0.2383 | 304 | 0.1663 |
646
+ | 0.2391 | 305 | 0.1649 |
647
+ | 0.2399 | 306 | 0.1358 |
648
+ | 0.2406 | 307 | 0.1746 |
649
+ | 0.2414 | 308 | 0.1664 |
650
+ | 0.2422 | 309 | 0.1135 |
651
+ | 0.2430 | 310 | 0.1612 |
652
+ | 0.2438 | 311 | 0.1529 |
653
+ | 0.2446 | 312 | 0.1367 |
654
+ | 0.2453 | 313 | 0.1709 |
655
+ | 0.2461 | 314 | 0.1757 |
656
+ | 0.2469 | 315 | 0.1885 |
657
+ | 0.2477 | 316 | 0.1792 |
658
+ | 0.2485 | 317 | 0.1195 |
659
+ | 0.2493 | 318 | 0.1451 |
660
+ | 0.2500 | 319 | 0.1684 |
661
+ | 0.2508 | 320 | 0.1299 |
662
+ | 0.2516 | 321 | 0.1867 |
663
+ | 0.2524 | 322 | 0.1899 |
664
+ | 0.2532 | 323 | 0.1329 |
665
+ | 0.2540 | 324 | 0.1403 |
666
+ | 0.2548 | 325 | 0.1862 |
667
+ | 0.2555 | 326 | 0.1407 |
668
+ | 0.2563 | 327 | 0.1756 |
669
+ | 0.2571 | 328 | 0.1465 |
670
+ | 0.2579 | 329 | 0.1638 |
671
+ | 0.2587 | 330 | 0.1506 |
672
+ | 0.2595 | 331 | 0.1431 |
673
+ | 0.2602 | 332 | 0.1975 |
674
+ | 0.2610 | 333 | 0.1678 |
675
+ | 0.2618 | 334 | 0.1695 |
676
+ | 0.2626 | 335 | 0.1905 |
677
+ | 0.2634 | 336 | 0.1754 |
678
+ | 0.2642 | 337 | 0.145 |
679
+ | 0.2649 | 338 | 0.1787 |
680
+ | 0.2657 | 339 | 0.1464 |
681
+ | 0.2665 | 340 | 0.1598 |
682
+ | 0.2673 | 341 | 0.1159 |
683
+ | 0.2681 | 342 | 0.1573 |
684
+ | 0.2689 | 343 | 0.2009 |
685
+ | 0.2696 | 344 | 0.2046 |
686
+ | 0.2704 | 345 | 0.1523 |
687
+ | 0.2712 | 346 | 0.1293 |
688
+ | 0.2720 | 347 | 0.1614 |
689
+ | 0.2728 | 348 | 0.1538 |
690
+ | 0.2736 | 349 | 0.1418 |
691
+ | 0.2743 | 350 | 0.158 |
692
+ | 0.2751 | 351 | 0.1443 |
693
+ | 0.2759 | 352 | 0.1437 |
694
+ | 0.2767 | 353 | 0.1506 |
695
+ | 0.2775 | 354 | 0.1452 |
696
+ | 0.2783 | 355 | 0.1637 |
697
+ | 0.2791 | 356 | 0.1015 |
698
+ | 0.2798 | 357 | 0.1531 |
699
+ | 0.2806 | 358 | 0.162 |
700
+ | 0.2814 | 359 | 0.1166 |
701
+ | 0.2822 | 360 | 0.1968 |
702
+ | 0.2830 | 361 | 0.1828 |
703
+ | 0.2838 | 362 | 0.1281 |
704
+ | 0.2845 | 363 | 0.1738 |
705
+ | 0.2853 | 364 | 0.1785 |
706
+ | 0.2861 | 365 | 0.1475 |
707
+ | 0.2869 | 366 | 0.179 |
708
+ | 0.2877 | 367 | 0.1322 |
709
+ | 0.2885 | 368 | 0.234 |
710
+ | 0.2892 | 369 | 0.1465 |
711
+ | 0.2900 | 370 | 0.125 |
712
+ | 0.2908 | 371 | 0.1945 |
713
+ | 0.2916 | 372 | 0.1728 |
714
+ | 0.2924 | 373 | 0.1246 |
715
+ | 0.2932 | 374 | 0.1662 |
716
+ | 0.2939 | 375 | 0.1881 |
717
+ | 0.2947 | 376 | 0.1409 |
718
+ | 0.2955 | 377 | 0.188 |
719
+ | 0.2963 | 378 | 0.1482 |
720
+ | 0.2971 | 379 | 0.1451 |
721
+ | 0.2979 | 380 | 0.1562 |
722
+ | 0.2986 | 381 | 0.1606 |
723
+ | 0.2994 | 382 | 0.1437 |
724
+ | 0.3002 | 383 | 0.1271 |
725
+ | 0.3010 | 384 | 0.1796 |
726
+ | 0.3018 | 385 | 0.14 |
727
+ | 0.3026 | 386 | 0.1645 |
728
+ | 0.3034 | 387 | 0.1589 |
729
+ | 0.3041 | 388 | 0.1668 |
730
+ | 0.3049 | 389 | 0.1176 |
731
+ | 0.3057 | 390 | 0.1651 |
732
+ | 0.3065 | 391 | 0.1425 |
733
+ | 0.3073 | 392 | 0.194 |
734
+ | 0.3081 | 393 | 0.13 |
735
+ | 0.3088 | 394 | 0.1302 |
736
+ | 0.3096 | 395 | 0.1224 |
737
+ | 0.3104 | 396 | 0.1249 |
738
+ | 0.3112 | 397 | 0.1821 |
739
+ | 0.3120 | 398 | 0.1551 |
740
+ | 0.3128 | 399 | 0.1444 |
741
+ | 0.3135 | 400 | 0.1841 |
742
+ | 0.3143 | 401 | 0.1276 |
743
+ | 0.3151 | 402 | 0.1733 |
744
+ | 0.3159 | 403 | 0.1595 |
745
+ | 0.3167 | 404 | 0.2037 |
746
+ | 0.3175 | 405 | 0.1601 |
747
+ | 0.3182 | 406 | 0.1501 |
748
+ | 0.3190 | 407 | 0.1467 |
749
+ | 0.3198 | 408 | 0.1194 |
750
+ | 0.3206 | 409 | 0.1532 |
751
+ | 0.3214 | 410 | 0.1292 |
752
+ | 0.3222 | 411 | 0.1576 |
753
+ | 0.3229 | 412 | 0.1431 |
754
+ | 0.3237 | 413 | 0.151 |
755
+ | 0.3245 | 414 | 0.1024 |
756
+ | 0.3253 | 415 | 0.1696 |
757
+ | 0.3261 | 416 | 0.129 |
758
+ | 0.3269 | 417 | 0.1934 |
759
+ | 0.3277 | 418 | 0.2072 |
760
+ | 0.3284 | 419 | 0.1387 |
761
+ | 0.3292 | 420 | 0.146 |
762
+ | 0.3300 | 421 | 0.1325 |
763
+ | 0.3308 | 422 | 0.1555 |
764
+ | 0.3316 | 423 | 0.1281 |
765
+ | 0.3324 | 424 | 0.1869 |
766
+ | 0.3331 | 425 | 0.1802 |
767
+ | 0.3339 | 426 | 0.1774 |
768
+ | 0.3347 | 427 | 0.1495 |
769
+ | 0.3355 | 428 | 0.1022 |
770
+ | 0.3363 | 429 | 0.1546 |
771
+ | 0.3371 | 430 | 0.1512 |
772
+ | 0.3378 | 431 | 0.1734 |
773
+ | 0.3386 | 432 | 0.1285 |
774
+ | 0.3394 | 433 | 0.1562 |
775
+ | 0.3402 | 434 | 0.1437 |
776
+ | 0.3410 | 435 | 0.1485 |
777
+ | 0.3418 | 436 | 0.1443 |
778
+ | 0.3425 | 437 | 0.1304 |
779
+ | 0.3433 | 438 | 0.1479 |
780
+ | 0.3441 | 439 | 0.1544 |
781
+ | 0.3449 | 440 | 0.1947 |
782
+ | 0.3457 | 441 | 0.1685 |
783
+ | 0.3465 | 442 | 0.1715 |
784
+ | 0.3472 | 443 | 0.1269 |
785
+ | 0.3480 | 444 | 0.1739 |
786
+ | 0.3488 | 445 | 0.1798 |
787
+ | 0.3496 | 446 | 0.1329 |
788
+ | 0.3504 | 447 | 0.1737 |
789
+ | 0.3512 | 448 | 0.1197 |
790
+ | 0.3519 | 449 | 0.1326 |
791
+ | 0.3527 | 450 | 0.131 |
792
+ | 0.3535 | 451 | 0.1498 |
793
+ | 0.3543 | 452 | 0.1836 |
794
+ | 0.3551 | 453 | 0.115 |
795
+ | 0.3559 | 454 | 0.1766 |
796
+ | 0.3567 | 455 | 0.1289 |
797
+ | 0.3574 | 456 | 0.1359 |
798
+ | 0.3582 | 457 | 0.1245 |
799
+ | 0.3590 | 458 | 0.1793 |
800
+ | 0.3598 | 459 | 0.1615 |
801
+ | 0.3606 | 460 | 0.1122 |
802
+ | 0.3614 | 461 | 0.1767 |
803
+ | 0.3621 | 462 | 0.1464 |
804
+ | 0.3629 | 463 | 0.1377 |
805
+ | 0.3637 | 464 | 0.1341 |
806
+ | 0.3645 | 465 | 0.1511 |
807
+ | 0.3653 | 466 | 0.1444 |
808
+ | 0.3661 | 467 | 0.1407 |
809
+ | 0.3668 | 468 | 0.1602 |
810
+ | 0.3676 | 469 | 0.1352 |
811
+ | 0.3684 | 470 | 0.1203 |
812
+ | 0.3692 | 471 | 0.1367 |
813
+ | 0.3700 | 472 | 0.1554 |
814
+ | 0.3708 | 473 | 0.1006 |
815
+ | 0.3715 | 474 | 0.1499 |
816
+ | 0.3723 | 475 | 0.1324 |
817
+ | 0.3731 | 476 | 0.1654 |
818
+ | 0.3739 | 477 | 0.1509 |
819
+ | 0.3747 | 478 | 0.1237 |
820
+ | 0.3755 | 479 | 0.1298 |
821
+ | 0.3762 | 480 | 0.1403 |
822
+ | 0.3770 | 481 | 0.1314 |
823
+ | 0.3778 | 482 | 0.1704 |
824
+ | 0.3786 | 483 | 0.1285 |
825
+ | 0.3794 | 484 | 0.1896 |
826
+ | 0.3802 | 485 | 0.1358 |
827
+ | 0.3810 | 486 | 0.1065 |
828
+ | 0.3817 | 487 | 0.1382 |
829
+ | 0.3825 | 488 | 0.1372 |
830
+ | 0.3833 | 489 | 0.1215 |
831
+ | 0.3841 | 490 | 0.2131 |
832
+ | 0.3849 | 491 | 0.1512 |
833
+ | 0.3857 | 492 | 0.1323 |
834
+ | 0.3864 | 493 | 0.1398 |
835
+ | 0.3872 | 494 | 0.151 |
836
+ | 0.3880 | 495 | 0.1297 |
837
+ | 0.3888 | 496 | 0.1852 |
838
+ | 0.3896 | 497 | 0.1044 |
839
+ | 0.3904 | 498 | 0.1185 |
840
+ | 0.3911 | 499 | 0.1724 |
841
+ | 0.3919 | 500 | 0.097 |
842
+ | 0.3927 | 501 | 0.1486 |
843
+ | 0.3935 | 502 | 0.1124 |
844
+ | 0.3943 | 503 | 0.1264 |
845
+ | 0.3951 | 504 | 0.0993 |
846
+ | 0.3958 | 505 | 0.1369 |
847
+ | 0.3966 | 506 | 0.1587 |
848
+ | 0.3974 | 507 | 0.1455 |
849
+ | 0.3982 | 508 | 0.1236 |
850
+ | 0.3990 | 509 | 0.1547 |
851
+ | 0.3998 | 510 | 0.1286 |
852
+ | 0.4005 | 511 | 0.1257 |
853
+ | 0.4013 | 512 | 0.1452 |
854
+ | 0.4021 | 513 | 0.1595 |
855
+ | 0.4029 | 514 | 0.1479 |
856
+ | 0.4037 | 515 | 0.166 |
857
+ | 0.4045 | 516 | 0.1623 |
858
+ | 0.4053 | 517 | 0.136 |
859
+ | 0.4060 | 518 | 0.149 |
860
+ | 0.4068 | 519 | 0.1496 |
861
+ | 0.4076 | 520 | 0.1154 |
862
+ | 0.4084 | 521 | 0.1493 |
863
+ | 0.4092 | 522 | 0.113 |
864
+ | 0.4100 | 523 | 0.137 |
865
+ | 0.4107 | 524 | 0.2077 |
866
+ | 0.4115 | 525 | 0.112 |
867
+ | 0.4123 | 526 | 0.1491 |
868
+ | 0.4131 | 527 | 0.1608 |
869
+ | 0.4139 | 528 | 0.1446 |
870
+ | 0.4147 | 529 | 0.1188 |
871
+ | 0.4154 | 530 | 0.137 |
872
+ | 0.4162 | 531 | 0.1072 |
873
+ | 0.4170 | 532 | 0.088 |
874
+ | 0.4178 | 533 | 0.1182 |
875
+ | 0.4186 | 534 | 0.2556 |
876
+ | 0.4194 | 535 | 0.1907 |
877
+ | 0.4201 | 536 | 0.1156 |
878
+ | 0.4209 | 537 | 0.1676 |
879
+ | 0.4217 | 538 | 0.1236 |
880
+ | 0.4225 | 539 | 0.1009 |
881
+ | 0.4233 | 540 | 0.1567 |
882
+ | 0.4241 | 541 | 0.2222 |
883
+ | 0.4248 | 542 | 0.148 |
884
+ | 0.4256 | 543 | 0.1182 |
885
+ | 0.4264 | 544 | 0.1267 |
886
+ | 0.4272 | 545 | 0.127 |
887
+ | 0.4280 | 546 | 0.1372 |
888
+ | 0.4288 | 547 | 0.1299 |
889
+ | 0.4296 | 548 | 0.1711 |
890
+ | 0.4303 | 549 | 0.1608 |
891
+ | 0.4311 | 550 | 0.1278 |
892
+ | 0.4319 | 551 | 0.106 |
893
+ | 0.4327 | 552 | 0.1494 |
894
+ | 0.4335 | 553 | 0.1093 |
895
+ | 0.4343 | 554 | 0.1833 |
896
+ | 0.4350 | 555 | 0.1876 |
897
+ | 0.4358 | 556 | 0.1774 |
898
+ | 0.4366 | 557 | 0.1443 |
899
+ | 0.4374 | 558 | 0.1351 |
900
+ | 0.4382 | 559 | 0.1094 |
901
+ | 0.4390 | 560 | 0.1485 |
902
+ | 0.4397 | 561 | 0.1156 |
903
+ | 0.4405 | 562 | 0.1324 |
904
+ | 0.4413 | 563 | 0.1314 |
905
+ | 0.4421 | 564 | 0.1601 |
906
+ | 0.4429 | 565 | 0.1434 |
907
+ | 0.4437 | 566 | 0.1785 |
908
+ | 0.4444 | 567 | 0.1044 |
909
+ | 0.4452 | 568 | 0.1123 |
910
+ | 0.4460 | 569 | 0.1235 |
911
+ | 0.4468 | 570 | 0.1384 |
912
+ | 0.4476 | 571 | 0.1357 |
913
+ | 0.4484 | 572 | 0.1357 |
914
+ | 0.4491 | 573 | 0.1276 |
915
+ | 0.4499 | 574 | 0.1554 |
916
+ | 0.4507 | 575 | 0.1235 |
917
+ | 0.4515 | 576 | 0.1319 |
918
+ | 0.4523 | 577 | 0.1862 |
919
+ | 0.4531 | 578 | 0.1523 |
920
+ | 0.4539 | 579 | 0.1224 |
921
+ | 0.4546 | 580 | 0.1629 |
922
+ | 0.4554 | 581 | 0.1113 |
923
+ | 0.4562 | 582 | 0.1261 |
924
+ | 0.4570 | 583 | 0.1246 |
925
+ | 0.4578 | 584 | 0.1461 |
926
+ | 0.4586 | 585 | 0.1831 |
927
+ | 0.4593 | 586 | 0.138 |
928
+ | 0.4601 | 587 | 0.1206 |
929
+ | 0.4609 | 588 | 0.1269 |
930
+ | 0.4617 | 589 | 0.1512 |
931
+ | 0.4625 | 590 | 0.1131 |
932
+ | 0.4633 | 591 | 0.1206 |
933
+ | 0.4640 | 592 | 0.1555 |
934
+ | 0.4648 | 593 | 0.1404 |
935
+ | 0.4656 | 594 | 0.101 |
936
+ | 0.4664 | 595 | 0.0881 |
937
+ | 0.4672 | 596 | 0.1793 |
938
+ | 0.4680 | 597 | 0.0995 |
939
+ | 0.4687 | 598 | 0.1369 |
940
+ | 0.4695 | 599 | 0.141 |
941
+ | 0.4703 | 600 | 0.1494 |
942
+ | 0.4711 | 601 | 0.1824 |
943
+ | 0.4719 | 602 | 0.1671 |
944
+ | 0.4727 | 603 | 0.1805 |
945
+ | 0.4734 | 604 | 0.1475 |
946
+ | 0.4742 | 605 | 0.1128 |
947
+ | 0.4750 | 606 | 0.1748 |
948
+ | 0.4758 | 607 | 0.1564 |
949
+ | 0.4766 | 608 | 0.0922 |
950
+ | 0.4774 | 609 | 0.1008 |
951
+ | 0.4782 | 610 | 0.1324 |
952
+ | 0.4789 | 611 | 0.1022 |
953
+ | 0.4797 | 612 | 0.1604 |
954
+ | 0.4805 | 613 | 0.145 |
955
+ | 0.4813 | 614 | 0.1621 |
956
+ | 0.4821 | 615 | 0.15 |
957
+ | 0.4829 | 616 | 0.1092 |
958
+ | 0.4836 | 617 | 0.1239 |
959
+ | 0.4844 | 618 | 0.1352 |
960
+ | 0.4852 | 619 | 0.1098 |
961
+ | 0.4860 | 620 | 0.1341 |
962
+ | 0.4868 | 621 | 0.1538 |
963
+ | 0.4876 | 622 | 0.1146 |
964
+ | 0.4883 | 623 | 0.1498 |
965
+ | 0.4891 | 624 | 0.1358 |
966
+ | 0.4899 | 625 | 0.1571 |
967
+ | 0.4907 | 626 | 0.1508 |
968
+ | 0.4915 | 627 | 0.1424 |
969
+ | 0.4923 | 628 | 0.1731 |
970
+ | 0.4930 | 629 | 0.1398 |
971
+ | 0.4938 | 630 | 0.1234 |
972
+ | 0.4946 | 631 | 0.1409 |
973
+ | 0.4954 | 632 | 0.136 |
974
+ | 0.4962 | 633 | 0.1294 |
975
+ | 0.4970 | 634 | 0.1612 |
976
+ | 0.4977 | 635 | 0.1597 |
977
+ | 0.4985 | 636 | 0.1685 |
978
+ | 0.4993 | 637 | 0.1723 |
979
+ | 0.5001 | 638 | 0.1643 |
980
+ | 0.5009 | 639 | 0.1831 |
981
+ | 0.5017 | 640 | 0.0791 |
982
+ | 0.5024 | 641 | 0.1109 |
983
+ | 0.5032 | 642 | 0.1189 |
984
+ | 0.5040 | 643 | 0.1484 |
985
+ | 0.5048 | 644 | 0.1399 |
986
+ | 0.5056 | 645 | 0.1519 |
987
+ | 0.5064 | 646 | 0.1182 |
988
+ | 0.5072 | 647 | 0.1969 |
989
+ | 0.5079 | 648 | 0.1729 |
990
+ | 0.5087 | 649 | 0.1119 |
991
+ | 0.5095 | 650 | 0.099 |
992
+ | 0.5103 | 651 | 0.1265 |
993
+ | 0.5111 | 652 | 0.1068 |
994
+ | 0.5119 | 653 | 0.173 |
995
+ | 0.5126 | 654 | 0.1059 |
996
+ | 0.5134 | 655 | 0.1622 |
997
+ | 0.5142 | 656 | 0.1787 |
998
+ | 0.5150 | 657 | 0.2004 |
999
+ | 0.5158 | 658 | 0.1282 |
1000
+ | 0.5166 | 659 | 0.1218 |
1001
+ | 0.5173 | 660 | 0.1457 |
1002
+ | 0.5181 | 661 | 0.0966 |
1003
+ | 0.5189 | 662 | 0.1101 |
1004
+ | 0.5197 | 663 | 0.1581 |
1005
+ | 0.5205 | 664 | 0.1162 |
1006
+ | 0.5213 | 665 | 0.1724 |
1007
+ | 0.5220 | 666 | 0.1455 |
1008
+ | 0.5228 | 667 | 0.1586 |
1009
+ | 0.5236 | 668 | 0.1283 |
1010
+ | 0.5244 | 669 | 0.1475 |
1011
+ | 0.5252 | 670 | 0.1136 |
1012
+ | 0.5260 | 671 | 0.1461 |
1013
+ | 0.5267 | 672 | 0.1789 |
1014
+ | 0.5275 | 673 | 0.1617 |
1015
+ | 0.5283 | 674 | 0.1344 |
1016
+ | 0.5291 | 675 | 0.1603 |
1017
+ | 0.5299 | 676 | 0.1529 |
1018
+ | 0.5307 | 677 | 0.1135 |
1019
+ | 0.5315 | 678 | 0.1312 |
1020
+ | 0.5322 | 679 | 0.1493 |
1021
+ | 0.5330 | 680 | 0.158 |
1022
+ | 0.5338 | 681 | 0.1032 |
1023
+ | 0.5346 | 682 | 0.1082 |
1024
+ | 0.5354 | 683 | 0.1043 |
1025
+ | 0.5362 | 684 | 0.1127 |
1026
+ | 0.5369 | 685 | 0.105 |
1027
+ | 0.5377 | 686 | 0.1703 |
1028
+ | 0.5385 | 687 | 0.1805 |
1029
+ | 0.5393 | 688 | 0.1098 |
1030
+ | 0.5401 | 689 | 0.1161 |
1031
+ | 0.5409 | 690 | 0.107 |
1032
+ | 0.5416 | 691 | 0.1619 |
1033
+ | 0.5424 | 692 | 0.1076 |
1034
+ | 0.5432 | 693 | 0.1248 |
1035
+ | 0.5440 | 694 | 0.117 |
1036
+ | 0.5448 | 695 | 0.1158 |
1037
+ | 0.5456 | 696 | 0.1665 |
1038
+ | 0.5463 | 697 | 0.1261 |
1039
+ | 0.5471 | 698 | 0.1074 |
1040
+ | 0.5479 | 699 | 0.1018 |
1041
+ | 0.5487 | 700 | 0.1425 |
1042
+ | 0.5495 | 701 | 0.1119 |
1043
+ | 0.5503 | 702 | 0.1608 |
1044
+ | 0.5510 | 703 | 0.1732 |
1045
+ | 0.5518 | 704 | 0.1324 |
1046
+ | 0.5526 | 705 | 0.1151 |
1047
+ | 0.5534 | 706 | 0.1368 |
1048
+ | 0.5542 | 707 | 0.1507 |
1049
+ | 0.5550 | 708 | 0.1703 |
1050
+ | 0.5558 | 709 | 0.1286 |
1051
+ | 0.5565 | 710 | 0.1305 |
1052
+ | 0.5573 | 711 | 0.1771 |
1053
+ | 0.5581 | 712 | 0.1106 |
1054
+ | 0.5589 | 713 | 0.1431 |
1055
+ | 0.5597 | 714 | 0.1381 |
1056
+ | 0.5605 | 715 | 0.1388 |
1057
+ | 0.5612 | 716 | 0.1536 |
1058
+ | 0.5620 | 717 | 0.1843 |
1059
+ | 0.5628 | 718 | 0.1695 |
1060
+ | 0.5636 | 719 | 0.1179 |
1061
+ | 0.5644 | 720 | 0.1113 |
1062
+ | 0.5652 | 721 | 0.0922 |
1063
+ | 0.5659 | 722 | 0.1341 |
1064
+ | 0.5667 | 723 | 0.1129 |
1065
+ | 0.5675 | 724 | 0.1344 |
1066
+ | 0.5683 | 725 | 0.1571 |
1067
+ | 0.5691 | 726 | 0.1257 |
1068
+ | 0.5699 | 727 | 0.126 |
1069
+ | 0.5706 | 728 | 0.1706 |
1070
+ | 0.5714 | 729 | 0.1245 |
1071
+ | 0.5722 | 730 | 0.1703 |
1072
+ | 0.5730 | 731 | 0.1304 |
1073
+ | 0.5738 | 732 | 0.1552 |
1074
+ | 0.5746 | 733 | 0.1036 |
1075
+ | 0.5753 | 734 | 0.1269 |
1076
+ | 0.5761 | 735 | 0.1355 |
1077
+ | 0.5769 | 736 | 0.1153 |
1078
+ | 0.5777 | 737 | 0.0923 |
1079
+ | 0.5785 | 738 | 0.1359 |
1080
+ | 0.5793 | 739 | 0.1495 |
1081
+ | 0.5801 | 740 | 0.1818 |
1082
+ | 0.5808 | 741 | 0.1325 |
1083
+ | 0.5816 | 742 | 0.1755 |
1084
+ | 0.5824 | 743 | 0.1443 |
1085
+ | 0.5832 | 744 | 0.1255 |
1086
+ | 0.5840 | 745 | 0.1248 |
1087
+ | 0.5848 | 746 | 0.1161 |
1088
+ | 0.5855 | 747 | 0.1513 |
1089
+ | 0.5863 | 748 | 0.1117 |
1090
+ | 0.5871 | 749 | 0.156 |
1091
+ | 0.5879 | 750 | 0.1238 |
1092
+ | 0.5887 | 751 | 0.1318 |
1093
+ | 0.5895 | 752 | 0.1406 |
1094
+ | 0.5902 | 753 | 0.1065 |
1095
+ | 0.5910 | 754 | 0.1227 |
1096
+ | 0.5918 | 755 | 0.1444 |
1097
+ | 0.5926 | 756 | 0.1059 |
1098
+ | 0.5934 | 757 | 0.1307 |
1099
+ | 0.5942 | 758 | 0.1253 |
1100
+ | 0.5949 | 759 | 0.0993 |
1101
+ | 0.5957 | 760 | 0.1243 |
1102
+ | 0.5965 | 761 | 0.1326 |
1103
+ | 0.5973 | 762 | 0.1638 |
1104
+ | 0.5981 | 763 | 0.1423 |
1105
+ | 0.5989 | 764 | 0.1804 |
1106
+ | 0.5996 | 765 | 0.1176 |
1107
+ | 0.6004 | 766 | 0.1022 |
1108
+ | 0.6012 | 767 | 0.1451 |
1109
+ | 0.6020 | 768 | 0.1497 |
1110
+ | 0.6028 | 769 | 0.1407 |
1111
+ | 0.6036 | 770 | 0.1235 |
1112
+ | 0.6044 | 771 | 0.1017 |
1113
+ | 0.6051 | 772 | 0.1705 |
1114
+ | 0.6059 | 773 | 0.1385 |
1115
+ | 0.6067 | 774 | 0.1194 |
1116
+ | 0.6075 | 775 | 0.1029 |
1117
+ | 0.6083 | 776 | 0.139 |
1118
+ | 0.6091 | 777 | 0.1298 |
1119
+ | 0.6098 | 778 | 0.1878 |
1120
+ | 0.6106 | 779 | 0.1353 |
1121
+ | 0.6114 | 780 | 0.1413 |
1122
+ | 0.6122 | 781 | 0.1129 |
1123
+ | 0.6130 | 782 | 0.1296 |
1124
+ | 0.6138 | 783 | 0.1532 |
1125
+ | 0.6145 | 784 | 0.1769 |
1126
+ | 0.6153 | 785 | 0.1235 |
1127
+ | 0.6161 | 786 | 0.1059 |
1128
+ | 0.6169 | 787 | 0.1224 |
1129
+ | 0.6177 | 788 | 0.1591 |
1130
+ | 0.6185 | 789 | 0.1127 |
1131
+ | 0.6192 | 790 | 0.1519 |
1132
+ | 0.6200 | 791 | 0.1473 |
1133
+ | 0.6208 | 792 | 0.0953 |
1134
+ | 0.6216 | 793 | 0.1302 |
1135
+ | 0.6224 | 794 | 0.149 |
1136
+ | 0.6232 | 795 | 0.1053 |
1137
+ | 0.6239 | 796 | 0.1712 |
1138
+ | 0.6247 | 797 | 0.1342 |
1139
+ | 0.6255 | 798 | 0.1199 |
1140
+ | 0.6263 | 799 | 0.1099 |
1141
+ | 0.6271 | 800 | 0.1545 |
1142
+ | 0.6279 | 801 | 0.1158 |
1143
+ | 0.6286 | 802 | 0.1541 |
1144
+ | 0.6294 | 803 | 0.1234 |
1145
+ | 0.6302 | 804 | 0.1451 |
1146
+ | 0.6310 | 805 | 0.1069 |
1147
+ | 0.6318 | 806 | 0.1282 |
1148
+ | 0.6326 | 807 | 0.1589 |
1149
+ | 0.6334 | 808 | 0.1358 |
1150
+ | 0.6341 | 809 | 0.1515 |
1151
+ | 0.6349 | 810 | 0.1334 |
1152
+ | 0.6357 | 811 | 0.1232 |
1153
+ | 0.6365 | 812 | 0.1612 |
1154
+ | 0.6373 | 813 | 0.1379 |
1155
+ | 0.6381 | 814 | 0.1347 |
1156
+ | 0.6388 | 815 | 0.1588 |
1157
+ | 0.6396 | 816 | 0.1173 |
1158
+ | 0.6404 | 817 | 0.1318 |
1159
+ | 0.6412 | 818 | 0.1541 |
1160
+ | 0.6420 | 819 | 0.1054 |
1161
+ | 0.6428 | 820 | 0.1117 |
1162
+ | 0.6435 | 821 | 0.1684 |
1163
+ | 0.6443 | 822 | 0.1234 |
1164
+ | 0.6451 | 823 | 0.1422 |
1165
+ | 0.6459 | 824 | 0.0979 |
1166
+ | 0.6467 | 825 | 0.1365 |
1167
+ | 0.6475 | 826 | 0.1177 |
1168
+ | 0.6482 | 827 | 0.1656 |
1169
+ | 0.6490 | 828 | 0.1288 |
1170
+ | 0.6498 | 829 | 0.1198 |
1171
+ | 0.6506 | 830 | 0.1546 |
1172
+ | 0.6514 | 831 | 0.1397 |
1173
+ | 0.6522 | 832 | 0.1578 |
1174
+ | 0.6529 | 833 | 0.1736 |
1175
+ | 0.6537 | 834 | 0.1174 |
1176
+ | 0.6545 | 835 | 0.1275 |
1177
+ | 0.6553 | 836 | 0.0971 |
1178
+ | 0.6561 | 837 | 0.1285 |
1179
+ | 0.6569 | 838 | 0.1285 |
1180
+ | 0.6577 | 839 | 0.1563 |
1181
+ | 0.6584 | 840 | 0.155 |
1182
+ | 0.6592 | 841 | 0.1398 |
1183
+ | 0.6600 | 842 | 0.1465 |
1184
+ | 0.6608 | 843 | 0.1201 |
1185
+ | 0.6616 | 844 | 0.1278 |
1186
+ | 0.6624 | 845 | 0.1155 |
1187
+ | 0.6631 | 846 | 0.0946 |
1188
+ | 0.6639 | 847 | 0.1152 |
1189
+ | 0.6647 | 848 | 0.1191 |
1190
+ | 0.6655 | 849 | 0.1175 |
1191
+ | 0.6663 | 850 | 0.133 |
1192
+ | 0.6671 | 851 | 0.1134 |
1193
+ | 0.6678 | 852 | 0.1664 |
1194
+ | 0.6686 | 853 | 0.1803 |
1195
+ | 0.6694 | 854 | 0.1155 |
1196
+ | 0.6702 | 855 | 0.1188 |
1197
+ | 0.6710 | 856 | 0.1283 |
1198
+ | 0.6718 | 857 | 0.0995 |
1199
+ | 0.6725 | 858 | 0.1438 |
1200
+ | 0.6733 | 859 | 0.1105 |
1201
+ | 0.6741 | 860 | 0.1114 |
1202
+ | 0.6749 | 861 | 0.089 |
1203
+ | 0.6757 | 862 | 0.1249 |
1204
+ | 0.6765 | 863 | 0.1194 |
1205
+ | 0.6772 | 864 | 0.1591 |
1206
+ | 0.6780 | 865 | 0.128 |
1207
+ | 0.6788 | 866 | 0.0787 |
1208
+ | 0.6796 | 867 | 0.13 |
1209
+ | 0.6804 | 868 | 0.0992 |
1210
+ | 0.6812 | 869 | 0.1229 |
1211
+ | 0.6820 | 870 | 0.095 |
1212
+ | 0.6827 | 871 | 0.1234 |
1213
+ | 0.6835 | 872 | 0.1201 |
1214
+ | 0.6843 | 873 | 0.1069 |
1215
+ | 0.6851 | 874 | 0.1282 |
1216
+ | 0.6859 | 875 | 0.1602 |
1217
+ | 0.6867 | 876 | 0.1 |
1218
+ | 0.6874 | 877 | 0.1437 |
1219
+ | 0.6882 | 878 | 0.1167 |
1220
+ | 0.6890 | 879 | 0.1841 |
1221
+ | 0.6898 | 880 | 0.1011 |
1222
+ | 0.6906 | 881 | 0.1264 |
1223
+ | 0.6914 | 882 | 0.1249 |
1224
+ | 0.6921 | 883 | 0.1261 |
1225
+ | 0.6929 | 884 | 0.1608 |
1226
+ | 0.6937 | 885 | 0.1398 |
1227
+ | 0.6945 | 886 | 0.15 |
1228
+ | 0.6953 | 887 | 0.1562 |
1229
+ | 0.6961 | 888 | 0.1092 |
1230
+ | 0.6968 | 889 | 0.1311 |
1231
+ | 0.6976 | 890 | 0.1564 |
1232
+ | 0.6984 | 891 | 0.1224 |
1233
+ | 0.6992 | 892 | 0.1126 |
1234
+ | 0.7000 | 893 | 0.0974 |
1235
+ | 0.7008 | 894 | 0.1638 |
1236
+ | 0.7015 | 895 | 0.118 |
1237
+ | 0.7023 | 896 | 0.1156 |
1238
+ | 0.7031 | 897 | 0.1141 |
1239
+ | 0.7039 | 898 | 0.1756 |
1240
+ | 0.7047 | 899 | 0.1165 |
1241
+ | 0.7055 | 900 | 0.142 |
1242
+ | 0.7063 | 901 | 0.1705 |
1243
+ | 0.7070 | 902 | 0.1311 |
1244
+ | 0.7078 | 903 | 0.1045 |
1245
+ | 0.7086 | 904 | 0.1034 |
1246
+ | 0.7094 | 905 | 0.1205 |
1247
+ | 0.7102 | 906 | 0.1448 |
1248
+ | 0.7110 | 907 | 0.1318 |
1249
+ | 0.7117 | 908 | 0.1369 |
1250
+ | 0.7125 | 909 | 0.1427 |
1251
+ | 0.7133 | 910 | 0.1218 |
1252
+ | 0.7141 | 911 | 0.103 |
1253
+ | 0.7149 | 912 | 0.1147 |
1254
+ | 0.7157 | 913 | 0.1297 |
1255
+ | 0.7164 | 914 | 0.1089 |
1256
+ | 0.7172 | 915 | 0.1371 |
1257
+ | 0.7180 | 916 | 0.1182 |
1258
+ | 0.7188 | 917 | 0.1273 |
1259
+ | 0.7196 | 918 | 0.1238 |
1260
+ | 0.7204 | 919 | 0.144 |
1261
+ | 0.7211 | 920 | 0.0859 |
1262
+ | 0.7219 | 921 | 0.0939 |
1263
+ | 0.7227 | 922 | 0.0999 |
1264
+ | 0.7235 | 923 | 0.1143 |
1265
+ | 0.7243 | 924 | 0.1251 |
1266
+ | 0.7251 | 925 | 0.107 |
1267
+ | 0.7258 | 926 | 0.1077 |
1268
+ | 0.7266 | 927 | 0.138 |
1269
+ | 0.7274 | 928 | 0.155 |
1270
+ | 0.7282 | 929 | 0.0977 |
1271
+ | 0.7290 | 930 | 0.1003 |
1272
+ | 0.7298 | 931 | 0.1382 |
1273
+ | 0.7306 | 932 | 0.1006 |
1274
+ | 0.7313 | 933 | 0.1027 |
1275
+ | 0.7321 | 934 | 0.1124 |
1276
+ | 0.7329 | 935 | 0.1813 |
1277
+ | 0.7337 | 936 | 0.1159 |
1278
+ | 0.7345 | 937 | 0.0791 |
1279
+ | 0.7353 | 938 | 0.1435 |
1280
+ | 0.7360 | 939 | 0.1288 |
1281
+ | 0.7368 | 940 | 0.1078 |
1282
+ | 0.7376 | 941 | 0.127 |
1283
+ | 0.7384 | 942 | 0.1211 |
1284
+ | 0.7392 | 943 | 0.1442 |
1285
+ | 0.7400 | 944 | 0.1668 |
1286
+ | 0.7407 | 945 | 0.1679 |
1287
+ | 0.7415 | 946 | 0.1168 |
1288
+ | 0.7423 | 947 | 0.1626 |
1289
+ | 0.7431 | 948 | 0.1538 |
1290
+ | 0.7439 | 949 | 0.0938 |
1291
+ | 0.7447 | 950 | 0.1657 |
1292
+ | 0.7454 | 951 | 0.1303 |
1293
+ | 0.7462 | 952 | 0.098 |
1294
+ | 0.7470 | 953 | 0.1014 |
1295
+ | 0.7478 | 954 | 0.1153 |
1296
+ | 0.7486 | 955 | 0.1192 |
1297
+ | 0.7494 | 956 | 0.1418 |
1298
+ | 0.7501 | 957 | 0.1206 |
1299
+ | 0.7509 | 958 | 0.109 |
1300
+ | 0.7517 | 959 | 0.1 |
1301
+ | 0.7525 | 960 | 0.115 |
1302
+ | 0.7533 | 961 | 0.1099 |
1303
+ | 0.7541 | 962 | 0.1252 |
1304
+ | 0.7549 | 963 | 0.0938 |
1305
+ | 0.7556 | 964 | 0.1704 |
1306
+ | 0.7564 | 965 | 0.1313 |
1307
+ | 0.7572 | 966 | 0.1342 |
1308
+ | 0.7580 | 967 | 0.1648 |
1309
+ | 0.7588 | 968 | 0.107 |
1310
+ | 0.7596 | 969 | 0.1177 |
1311
+ | 0.7603 | 970 | 0.1528 |
1312
+ | 0.7611 | 971 | 0.1577 |
1313
+ | 0.7619 | 972 | 0.1109 |
1314
+ | 0.7627 | 973 | 0.1336 |
1315
+ | 0.7635 | 974 | 0.1544 |
1316
+ | 0.7643 | 975 | 0.1304 |
1317
+ | 0.7650 | 976 | 0.1083 |
1318
+ | 0.7658 | 977 | 0.1017 |
1319
+ | 0.7666 | 978 | 0.1492 |
1320
+ | 0.7674 | 979 | 0.0846 |
1321
+ | 0.7682 | 980 | 0.1179 |
1322
+ | 0.7690 | 981 | 0.1634 |
1323
+ | 0.7697 | 982 | 0.0893 |
1324
+ | 0.7705 | 983 | 0.1357 |
1325
+ | 0.7713 | 984 | 0.1757 |
1326
+ | 0.7721 | 985 | 0.1112 |
1327
+ | 0.7729 | 986 | 0.1258 |
1328
+ | 0.7737 | 987 | 0.123 |
1329
+ | 0.7744 | 988 | 0.1354 |
1330
+ | 0.7752 | 989 | 0.0855 |
1331
+ | 0.7760 | 990 | 0.1167 |
1332
+ | 0.7768 | 991 | 0.1131 |
1333
+ | 0.7776 | 992 | 0.1222 |
1334
+ | 0.7784 | 993 | 0.1447 |
1335
+ | 0.7791 | 994 | 0.1122 |
1336
+ | 0.7799 | 995 | 0.1508 |
1337
+ | 0.7807 | 996 | 0.1484 |
1338
+ | 0.7815 | 997 | 0.0985 |
1339
+ | 0.7823 | 998 | 0.1686 |
1340
+ | 0.7831 | 999 | 0.1509 |
1341
+ | 0.7839 | 1000 | 0.1356 |
1342
+ | 0.7846 | 1001 | 0.1114 |
1343
+ | 0.7854 | 1002 | 0.1098 |
1344
+ | 0.7862 | 1003 | 0.1643 |
1345
+ | 0.7870 | 1004 | 0.1784 |
1346
+ | 0.7878 | 1005 | 0.1038 |
1347
+ | 0.7886 | 1006 | 0.1362 |
1348
+ | 0.7893 | 1007 | 0.1289 |
1349
+ | 0.7901 | 1008 | 0.1188 |
1350
+ | 0.7909 | 1009 | 0.1065 |
1351
+ | 0.7917 | 1010 | 0.1195 |
1352
+ | 0.7925 | 1011 | 0.1142 |
1353
+ | 0.7933 | 1012 | 0.0801 |
1354
+ | 0.7940 | 1013 | 0.1427 |
1355
+ | 0.7948 | 1014 | 0.2034 |
1356
+ | 0.7956 | 1015 | 0.1508 |
1357
+ | 0.7964 | 1016 | 0.0888 |
1358
+ | 0.7972 | 1017 | 0.0847 |
1359
+ | 0.7980 | 1018 | 0.1007 |
1360
+ | 0.7987 | 1019 | 0.1122 |
1361
+ | 0.7995 | 1020 | 0.1215 |
1362
+ | 0.8003 | 1021 | 0.1529 |
1363
+ | 0.8011 | 1022 | 0.1095 |
1364
+ | 0.8019 | 1023 | 0.1364 |
1365
+ | 0.8027 | 1024 | 0.0978 |
1366
+ | 0.8034 | 1025 | 0.1606 |
1367
+ | 0.8042 | 1026 | 0.1131 |
1368
+ | 0.8050 | 1027 | 0.0861 |
1369
+ | 0.8058 | 1028 | 0.1523 |
1370
+ | 0.8066 | 1029 | 0.1444 |
1371
+ | 0.8074 | 1030 | 0.1255 |
1372
+ | 0.8082 | 1031 | 0.1418 |
1373
+ | 0.8089 | 1032 | 0.1007 |
1374
+ | 0.8097 | 1033 | 0.1042 |
1375
+ | 0.8105 | 1034 | 0.1423 |
1376
+ | 0.8113 | 1035 | 0.1137 |
1377
+ | 0.8121 | 1036 | 0.1314 |
1378
+ | 0.8129 | 1037 | 0.1572 |
1379
+ | 0.8136 | 1038 | 0.1188 |
1380
+ | 0.8144 | 1039 | 0.0916 |
1381
+ | 0.8152 | 1040 | 0.1043 |
1382
+ | 0.8160 | 1041 | 0.1333 |
1383
+ | 0.8168 | 1042 | 0.1299 |
1384
+ | 0.8176 | 1043 | 0.1404 |
1385
+ | 0.8183 | 1044 | 0.1209 |
1386
+ | 0.8191 | 1045 | 0.0973 |
1387
+ | 0.8199 | 1046 | 0.1359 |
1388
+ | 0.8207 | 1047 | 0.1194 |
1389
+ | 0.8215 | 1048 | 0.2011 |
1390
+ | 0.8223 | 1049 | 0.1306 |
1391
+ | 0.8230 | 1050 | 0.1073 |
1392
+ | 0.8238 | 1051 | 0.1154 |
1393
+ | 0.8246 | 1052 | 0.1224 |
1394
+ | 0.8254 | 1053 | 0.1045 |
1395
+ | 0.8262 | 1054 | 0.1067 |
1396
+ | 0.8270 | 1055 | 0.1086 |
1397
+ | 0.8277 | 1056 | 0.0923 |
1398
+ | 0.8285 | 1057 | 0.1228 |
1399
+ | 0.8293 | 1058 | 0.1474 |
1400
+ | 0.8301 | 1059 | 0.0949 |
1401
+ | 0.8309 | 1060 | 0.1259 |
1402
+ | 0.8317 | 1061 | 0.1152 |
1403
+ | 0.8325 | 1062 | 0.0937 |
1404
+ | 0.8332 | 1063 | 0.1602 |
1405
+ | 0.8340 | 1064 | 0.1165 |
1406
+ | 0.8348 | 1065 | 0.1036 |
1407
+ | 0.8356 | 1066 | 0.1665 |
1408
+ | 0.8364 | 1067 | 0.1163 |
1409
+ | 0.8372 | 1068 | 0.1124 |
1410
+ | 0.8379 | 1069 | 0.1093 |
1411
+ | 0.8387 | 1070 | 0.1015 |
1412
+ | 0.8395 | 1071 | 0.1602 |
1413
+ | 0.8403 | 1072 | 0.0913 |
1414
+ | 0.8411 | 1073 | 0.1327 |
1415
+ | 0.8419 | 1074 | 0.1149 |
1416
+ | 0.8426 | 1075 | 0.1137 |
1417
+ | 0.8434 | 1076 | 0.1197 |
1418
+ | 0.8442 | 1077 | 0.1335 |
1419
+ | 0.8450 | 1078 | 0.1366 |
1420
+ | 0.8458 | 1079 | 0.1265 |
1421
+ | 0.8466 | 1080 | 0.0921 |
1422
+ | 0.8473 | 1081 | 0.1339 |
1423
+ | 0.8481 | 1082 | 0.1155 |
1424
+ | 0.8489 | 1083 | 0.103 |
1425
+ | 0.8497 | 1084 | 0.1302 |
1426
+ | 0.8505 | 1085 | 0.1311 |
1427
+ | 0.8513 | 1086 | 0.1275 |
1428
+ | 0.8520 | 1087 | 0.1585 |
1429
+ | 0.8528 | 1088 | 0.0961 |
1430
+ | 0.8536 | 1089 | 0.1222 |
1431
+ | 0.8544 | 1090 | 0.0887 |
1432
+ | 0.8552 | 1091 | 0.1599 |
1433
+ | 0.8560 | 1092 | 0.0909 |
1434
+ | 0.8568 | 1093 | 0.1566 |
1435
+ | 0.8575 | 1094 | 0.1201 |
1436
+ | 0.8583 | 1095 | 0.0786 |
1437
+ | 0.8591 | 1096 | 0.1383 |
1438
+ | 0.8599 | 1097 | 0.1593 |
1439
+ | 0.8607 | 1098 | 0.1582 |
1440
+ | 0.8615 | 1099 | 0.1474 |
1441
+ | 0.8622 | 1100 | 0.0924 |
1442
+ | 0.8630 | 1101 | 0.1379 |
1443
+ | 0.8638 | 1102 | 0.1324 |
1444
+ | 0.8646 | 1103 | 0.1139 |
1445
+ | 0.8654 | 1104 | 0.0941 |
1446
+ | 0.8662 | 1105 | 0.1107 |
1447
+ | 0.8669 | 1106 | 0.1183 |
1448
+ | 0.8677 | 1107 | 0.1024 |
1449
+ | 0.8685 | 1108 | 0.1346 |
1450
+ | 0.8693 | 1109 | 0.131 |
1451
+ | 0.8701 | 1110 | 0.1244 |
1452
+ | 0.8709 | 1111 | 0.1423 |
1453
+ | 0.8716 | 1112 | 0.1604 |
1454
+ | 0.8724 | 1113 | 0.146 |
1455
+ | 0.8732 | 1114 | 0.1398 |
1456
+ | 0.8740 | 1115 | 0.1393 |
1457
+ | 0.8748 | 1116 | 0.1643 |
1458
+ | 0.8756 | 1117 | 0.1006 |
1459
+ | 0.8763 | 1118 | 0.0956 |
1460
+ | 0.8771 | 1119 | 0.1304 |
1461
+ | 0.8779 | 1120 | 0.1151 |
1462
+ | 0.8787 | 1121 | 0.161 |
1463
+ | 0.8795 | 1122 | 0.0871 |
1464
+ | 0.8803 | 1123 | 0.1028 |
1465
+ | 0.8811 | 1124 | 0.1715 |
1466
+ | 0.8818 | 1125 | 0.1674 |
1467
+ | 0.8826 | 1126 | 0.1073 |
1468
+ | 0.8834 | 1127 | 0.0867 |
1469
+ | 0.8842 | 1128 | 0.1117 |
1470
+ | 0.8850 | 1129 | 0.1333 |
1471
+ | 0.8858 | 1130 | 0.126 |
1472
+ | 0.8865 | 1131 | 0.0853 |
1473
+ | 0.8873 | 1132 | 0.1152 |
1474
+ | 0.8881 | 1133 | 0.1467 |
1475
+ | 0.8889 | 1134 | 0.1643 |
1476
+ | 0.8897 | 1135 | 0.1117 |
1477
+ | 0.8905 | 1136 | 0.0909 |
1478
+ | 0.8912 | 1137 | 0.1645 |
1479
+ | 0.8920 | 1138 | 0.1359 |
1480
+ | 0.8928 | 1139 | 0.1204 |
1481
+ | 0.8936 | 1140 | 0.1574 |
1482
+ | 0.8944 | 1141 | 0.1187 |
1483
+ | 0.8952 | 1142 | 0.1588 |
1484
+ | 0.8959 | 1143 | 0.1419 |
1485
+ | 0.8967 | 1144 | 0.1109 |
1486
+ | 0.8975 | 1145 | 0.1048 |
1487
+ | 0.8983 | 1146 | 0.1232 |
1488
+ | 0.8991 | 1147 | 0.1159 |
1489
+ | 0.8999 | 1148 | 0.1442 |
1490
+ | 0.9006 | 1149 | 0.1345 |
1491
+ | 0.9014 | 1150 | 0.0893 |
1492
+ | 0.9022 | 1151 | 0.1033 |
1493
+ | 0.9030 | 1152 | 0.1133 |
1494
+ | 0.9038 | 1153 | 0.2009 |
1495
+ | 0.9046 | 1154 | 0.1669 |
1496
+ | 0.9053 | 1155 | 0.1095 |
1497
+ | 0.9061 | 1156 | 0.1099 |
1498
+ | 0.9069 | 1157 | 0.0893 |
1499
+ | 0.9077 | 1158 | 0.137 |
1500
+ | 0.9085 | 1159 | 0.1346 |
1501
+ | 0.9093 | 1160 | 0.1135 |
1502
+ | 0.9101 | 1161 | 0.1003 |
1503
+ | 0.9108 | 1162 | 0.1224 |
1504
+ | 0.9116 | 1163 | 0.098 |
1505
+ | 0.9124 | 1164 | 0.1353 |
1506
+ | 0.9132 | 1165 | 0.1481 |
1507
+ | 0.9140 | 1166 | 0.1168 |
1508
+ | 0.9148 | 1167 | 0.0794 |
1509
+ | 0.9155 | 1168 | 0.0979 |
1510
+ | 0.9163 | 1169 | 0.1093 |
1511
+ | 0.9171 | 1170 | 0.1022 |
1512
+ | 0.9179 | 1171 | 0.1498 |
1513
+ | 0.9187 | 1172 | 0.1596 |
1514
+ | 0.9195 | 1173 | 0.1657 |
1515
+ | 0.9202 | 1174 | 0.1195 |
1516
+ | 0.9210 | 1175 | 0.1278 |
1517
+ | 0.9218 | 1176 | 0.1307 |
1518
+ | 0.9226 | 1177 | 0.1071 |
1519
+ | 0.9234 | 1178 | 0.0969 |
1520
+ | 0.9242 | 1179 | 0.1192 |
1521
+ | 0.9249 | 1180 | 0.1166 |
1522
+ | 0.9257 | 1181 | 0.1221 |
1523
+ | 0.9265 | 1182 | 0.1179 |
1524
+ | 0.9273 | 1183 | 0.1414 |
1525
+ | 0.9281 | 1184 | 0.1247 |
1526
+ | 0.9289 | 1185 | 0.1148 |
1527
+ | 0.9296 | 1186 | 0.1211 |
1528
+ | 0.9304 | 1187 | 0.1373 |
1529
+ | 0.9312 | 1188 | 0.1105 |
1530
+ | 0.9320 | 1189 | 0.0911 |
1531
+ | 0.9328 | 1190 | 0.1205 |
1532
+ | 0.9336 | 1191 | 0.1479 |
1533
+ | 0.9344 | 1192 | 0.115 |
1534
+ | 0.9351 | 1193 | 0.0951 |
1535
+ | 0.9359 | 1194 | 0.1501 |
1536
+ | 0.9367 | 1195 | 0.1069 |
1537
+ | 0.9375 | 1196 | 0.1091 |
1538
+ | 0.9383 | 1197 | 0.0988 |
1539
+ | 0.9391 | 1198 | 0.1278 |
1540
+ | 0.9398 | 1199 | 0.1221 |
1541
+ | 0.9406 | 1200 | 0.1418 |
1542
+ | 0.9414 | 1201 | 0.1354 |
1543
+ | 0.9422 | 1202 | 0.1435 |
1544
+ | 0.9430 | 1203 | 0.101 |
1545
+ | 0.9438 | 1204 | 0.1119 |
1546
+ | 0.9445 | 1205 | 0.1566 |
1547
+ | 0.9453 | 1206 | 0.1238 |
1548
+ | 0.9461 | 1207 | 0.1008 |
1549
+ | 0.9469 | 1208 | 0.1126 |
1550
+ | 0.9477 | 1209 | 0.0897 |
1551
+ | 0.9485 | 1210 | 0.1486 |
1552
+ | 0.9492 | 1211 | 0.0976 |
1553
+ | 0.9500 | 1212 | 0.124 |
1554
+ | 0.9508 | 1213 | 0.1034 |
1555
+ | 0.9516 | 1214 | 0.1229 |
1556
+ | 0.9524 | 1215 | 0.1301 |
1557
+ | 0.9532 | 1216 | 0.1363 |
1558
+ | 0.9539 | 1217 | 0.1161 |
1559
+ | 0.9547 | 1218 | 0.1199 |
1560
+ | 0.9555 | 1219 | 0.0815 |
1561
+ | 0.9563 | 1220 | 0.1034 |
1562
+ | 0.9571 | 1221 | 0.1554 |
1563
+ | 0.9579 | 1222 | 0.1266 |
1564
+ | 0.9587 | 1223 | 0.1153 |
1565
+ | 0.9594 | 1224 | 0.1129 |
1566
+ | 0.9602 | 1225 | 0.1228 |
1567
+ | 0.9610 | 1226 | 0.1268 |
1568
+ | 0.9618 | 1227 | 0.1515 |
1569
+ | 0.9626 | 1228 | 0.0885 |
1570
+ | 0.9634 | 1229 | 0.1142 |
1571
+ | 0.9641 | 1230 | 0.187 |
1572
+ | 0.9649 | 1231 | 0.0836 |
1573
+ | 0.9657 | 1232 | 0.0967 |
1574
+ | 0.9665 | 1233 | 0.1516 |
1575
+ | 0.9673 | 1234 | 0.0581 |
1576
+ | 0.9681 | 1235 | 0.0847 |
1577
+ | 0.9688 | 1236 | 0.1105 |
1578
+ | 0.9696 | 1237 | 0.0958 |
1579
+ | 0.9704 | 1238 | 0.1238 |
1580
+ | 0.9712 | 1239 | 0.1076 |
1581
+ | 0.9720 | 1240 | 0.1137 |
1582
+ | 0.9728 | 1241 | 0.1236 |
1583
+ | 0.9735 | 1242 | 0.129 |
1584
+ | 0.9743 | 1243 | 0.1113 |
1585
+ | 0.9751 | 1244 | 0.1466 |
1586
+ | 0.9759 | 1245 | 0.1593 |
1587
+ | 0.9767 | 1246 | 0.1151 |
1588
+ | 0.9775 | 1247 | 0.153 |
1589
+ | 0.9782 | 1248 | 0.1564 |
1590
+ | 0.9790 | 1249 | 0.1208 |
1591
+ | 0.9798 | 1250 | 0.0925 |
1592
+ | 0.9806 | 1251 | 0.1146 |
1593
+ | 0.9814 | 1252 | 0.1043 |
1594
+ | 0.9822 | 1253 | 0.0926 |
1595
+ | 0.9830 | 1254 | 0.1442 |
1596
+ | 0.9837 | 1255 | 0.134 |
1597
+ | 0.9845 | 1256 | 0.0841 |
1598
+ | 0.9853 | 1257 | 0.1256 |
1599
+ | 0.9861 | 1258 | 0.12 |
1600
+ | 0.9869 | 1259 | 0.0815 |
1601
+ | 0.9877 | 1260 | 0.1298 |
1602
+ | 0.9884 | 1261 | 0.1569 |
1603
+ | 0.9892 | 1262 | 0.1296 |
1604
+ | 0.9900 | 1263 | 0.1418 |
1605
+ | 0.9908 | 1264 | 0.1204 |
1606
+ | 0.9916 | 1265 | 0.1207 |
1607
+ | 0.9924 | 1266 | 0.1116 |
1608
+ | 0.9931 | 1267 | 0.0807 |
1609
+ | 0.9939 | 1268 | 0.1082 |
1610
+ | 0.9947 | 1269 | 0.1213 |
1611
+ | 0.9955 | 1270 | 0.1156 |
1612
+ | 0.9963 | 1271 | 0.1517 |
1613
+ | 0.9971 | 1272 | 0.1238 |
1614
+ | 0.9978 | 1273 | 0.1313 |
1615
+ | 0.9986 | 1274 | 0.131 |
1616
+ | 0.9994 | 1275 | 0.1584 |
1617
+
1618
+ </details>
1619
+
1620
+ ### Framework Versions
1621
+ - Python: 3.10.12
1622
+ - Sentence Transformers: 3.2.1
1623
+ - Transformers: 4.44.2
1624
+ - PyTorch: 2.3.1+cu121
1625
+ - Accelerate: 1.1.1
1626
+ - Datasets: 2.21.0
1627
+ - Tokenizers: 0.19.1
1628
+
1629
+ ## Citation
1630
+
1631
+ ### BibTeX
1632
+
1633
+ #### Sentence Transformers
1634
+ ```bibtex
1635
+ @inproceedings{reimers-2019-sentence-bert,
1636
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1637
+ author = "Reimers, Nils and Gurevych, Iryna",
1638
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1639
+ month = "11",
1640
+ year = "2019",
1641
+ publisher = "Association for Computational Linguistics",
1642
+ url = "https://arxiv.org/abs/1908.10084",
1643
+ }
1644
+ ```
1645
+
1646
+ #### MultipleNegativesRankingLoss
1647
+ ```bibtex
1648
+ @misc{henderson2017efficient,
1649
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
1650
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
1651
+ year={2017},
1652
+ eprint={1705.00652},
1653
+ archivePrefix={arXiv},
1654
+ primaryClass={cs.CL}
1655
+ }
1656
+ ```
1657
+
1658
+ <!--
1659
+ ## Glossary
1660
+
1661
+ *Clearly define terms in order to be accessible across audiences.*
1662
+ -->
1663
+
1664
+ <!--
1665
+ ## Model Card Authors
1666
+
1667
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1668
+ -->
1669
+
1670
+ <!--
1671
+ ## Model Card Contact
1672
+
1673
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1674
+ -->
config.json ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/root/models/gte-base-250k-answerableHN/checkpoint-1275",
3
+ "architectures": [
4
+ "NewModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration.NewConfig",
9
+ "AutoModel": "modeling.NewModel",
10
+ "AutoModelForMaskedLM": "Alibaba-NLP/new-impl--modeling.NewForMaskedLM",
11
+ "AutoModelForMultipleChoice": "Alibaba-NLP/new-impl--modeling.NewForMultipleChoice",
12
+ "AutoModelForQuestionAnswering": "Alibaba-NLP/new-impl--modeling.NewForQuestionAnswering",
13
+ "AutoModelForSequenceClassification": "Alibaba-NLP/new-impl--modeling.NewForSequenceClassification",
14
+ "AutoModelForTokenClassification": "Alibaba-NLP/new-impl--modeling.NewForTokenClassification"
15
+ },
16
+ "classifier_dropout": 0.0,
17
+ "hidden_act": "gelu",
18
+ "hidden_dropout_prob": 0.1,
19
+ "hidden_size": 768,
20
+ "id2label": {
21
+ "0": "LABEL_0"
22
+ },
23
+ "initializer_range": 0.02,
24
+ "intermediate_size": 3072,
25
+ "label2id": {
26
+ "LABEL_0": 0
27
+ },
28
+ "layer_norm_eps": 1e-12,
29
+ "layer_norm_type": "layer_norm",
30
+ "logn_attention_clip1": false,
31
+ "logn_attention_scale": false,
32
+ "max_position_embeddings": 8192,
33
+ "model_type": "new",
34
+ "num_attention_heads": 12,
35
+ "num_hidden_layers": 12,
36
+ "pack_qkv": true,
37
+ "pad_token_id": 1,
38
+ "position_embedding_type": "rope",
39
+ "rope_scaling": {
40
+ "factor": 8.0,
41
+ "type": "ntk"
42
+ },
43
+ "rope_theta": 20000,
44
+ "torch_dtype": "float32",
45
+ "transformers_version": "4.44.2",
46
+ "type_vocab_size": 1,
47
+ "unpad_inputs": false,
48
+ "use_memory_efficient_attention": false,
49
+ "vocab_size": 250048
50
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.2.1",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.3.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
configuration.py ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2024 The GTE Team Authors and Alibaba Group.
3
+ # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """ NEW model configuration"""
17
+ from transformers.configuration_utils import PretrainedConfig
18
+ from transformers.utils import logging
19
+
20
+ logger = logging.get_logger(__name__)
21
+
22
+
23
+ class NewConfig(PretrainedConfig):
24
+ r"""
25
+ This is the configuration class to store the configuration of a [`NewModel`] or a [`TFNewModel`]. It is used to
26
+ instantiate a NEW model according to the specified arguments, defining the model architecture. Instantiating a
27
+ configuration with the defaults will yield a similar configuration to that of the NEW
28
+ [izhx/new-base-en](https://huggingface.co/izhx/new-base-en) architecture.
29
+
30
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
31
+ documentation from [`PretrainedConfig`] for more information.
32
+
33
+
34
+ Args:
35
+ vocab_size (`int`, *optional*, defaults to 30522):
36
+ Vocabulary size of the NEW model. Defines the number of different tokens that can be represented by the
37
+ `inputs_ids` passed when calling [`NewModel`] or [`TFNewModel`].
38
+ hidden_size (`int`, *optional*, defaults to 768):
39
+ Dimensionality of the encoder layers and the pooler layer.
40
+ num_hidden_layers (`int`, *optional*, defaults to 12):
41
+ Number of hidden layers in the Transformer encoder.
42
+ num_attention_heads (`int`, *optional*, defaults to 12):
43
+ Number of attention heads for each attention layer in the Transformer encoder.
44
+ intermediate_size (`int`, *optional*, defaults to 3072):
45
+ Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
46
+ hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
47
+ The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
48
+ `"relu"`, `"silu"` and `"gelu_new"` are supported.
49
+ hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
50
+ The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
51
+ attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
52
+ The dropout ratio for the attention probabilities.
53
+ max_position_embeddings (`int`, *optional*, defaults to 512):
54
+ The maximum sequence length that this model might ever be used with. Typically set this to something large
55
+ just in case (e.g., 512 or 1024 or 2048).
56
+ type_vocab_size (`int`, *optional*, defaults to 2):
57
+ The vocabulary size of the `token_type_ids` passed when calling [`NewModel`] or [`TFNewModel`].
58
+ initializer_range (`float`, *optional*, defaults to 0.02):
59
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
60
+ layer_norm_eps (`float`, *optional*, defaults to 1e-12):
61
+ The epsilon used by the layer normalization layers.
62
+ position_embedding_type (`str`, *optional*, defaults to `"rope"`):
63
+ Type of position embedding. Choose one of `"absolute"`, `"rope"`.
64
+ rope_theta (`float`, *optional*, defaults to 10000.0):
65
+ The base period of the RoPE embeddings.
66
+ rope_scaling (`Dict`, *optional*):
67
+ Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
68
+ strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
69
+ `{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
70
+ `max_position_embeddings` to the expected new maximum. See the following thread for more information on how
71
+ these scaling strategies behave:
72
+ https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
73
+ experimental feature, subject to breaking API changes in future versions.
74
+ classifier_dropout (`float`, *optional*):
75
+ The dropout ratio for the classification head.
76
+
77
+ Examples:
78
+
79
+ ```python
80
+ >>> from transformers import NewConfig, NewModel
81
+
82
+ >>> # Initializing a NEW izhx/new-base-en style configuration
83
+ >>> configuration = NewConfig()
84
+
85
+ >>> # Initializing a model (with random weights) from the izhx/new-base-en style configuration
86
+ >>> model = NewModel(configuration)
87
+
88
+ >>> # Accessing the model configuration
89
+ >>> configuration = model.config
90
+ ```"""
91
+
92
+ model_type = "new"
93
+
94
+ def __init__(
95
+ self,
96
+ vocab_size=30528,
97
+ hidden_size=768,
98
+ num_hidden_layers=12,
99
+ num_attention_heads=12,
100
+ intermediate_size=3072,
101
+ hidden_act="gelu",
102
+ hidden_dropout_prob=0.1,
103
+ attention_probs_dropout_prob=0.0,
104
+ max_position_embeddings=2048,
105
+ type_vocab_size=1,
106
+ initializer_range=0.02,
107
+ layer_norm_type='layer_norm',
108
+ layer_norm_eps=1e-12,
109
+ # pad_token_id=0,
110
+ position_embedding_type="rope",
111
+ rope_theta=10000.0,
112
+ rope_scaling=None,
113
+ classifier_dropout=None,
114
+ pack_qkv=True,
115
+ unpad_inputs=False,
116
+ use_memory_efficient_attention=False,
117
+ logn_attention_scale=False,
118
+ logn_attention_clip1=False,
119
+ **kwargs,
120
+ ):
121
+ super().__init__(**kwargs)
122
+
123
+ self.vocab_size = vocab_size
124
+ self.hidden_size = hidden_size
125
+ self.num_hidden_layers = num_hidden_layers
126
+ self.num_attention_heads = num_attention_heads
127
+ self.hidden_act = hidden_act
128
+ self.intermediate_size = intermediate_size
129
+ self.hidden_dropout_prob = hidden_dropout_prob
130
+ self.attention_probs_dropout_prob = attention_probs_dropout_prob
131
+ self.max_position_embeddings = max_position_embeddings
132
+ self.type_vocab_size = type_vocab_size
133
+ self.initializer_range = initializer_range
134
+ self.layer_norm_type = layer_norm_type
135
+ self.layer_norm_eps = layer_norm_eps
136
+ self.position_embedding_type = position_embedding_type
137
+ self.rope_theta = rope_theta
138
+ self.rope_scaling = rope_scaling
139
+ self.classifier_dropout = classifier_dropout
140
+
141
+ self.pack_qkv = pack_qkv
142
+ self.unpad_inputs = unpad_inputs
143
+ self.use_memory_efficient_attention = use_memory_efficient_attention
144
+ self.logn_attention_scale = logn_attention_scale
145
+ self.logn_attention_clip1 = logn_attention_clip1
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89fbfee7d253e4980ae8ec279072e5ca562eb385ee29914ded1e270b873c7642
3
+ size 1221487872
modeling.py ADDED
@@ -0,0 +1,1418 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2024 The GTE Team Authors and Alibaba Group.
3
+ # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """PyTorch NEW model."""
17
+
18
+ import math
19
+ from dataclasses import dataclass
20
+ from typing import List, Optional, Tuple, Union
21
+
22
+ import torch
23
+ import torch.utils.checkpoint
24
+ from torch import nn
25
+
26
+ from transformers.activations import ACT2FN
27
+ from transformers.modeling_outputs import (
28
+ BaseModelOutput,
29
+ BaseModelOutputWithPooling,
30
+ MaskedLMOutput,
31
+ MultipleChoiceModelOutput,
32
+ QuestionAnsweringModelOutput,
33
+ SequenceClassifierOutput,
34
+ ModelOutput,
35
+ )
36
+ from transformers.modeling_utils import PreTrainedModel
37
+ from transformers.utils import logging
38
+
39
+ try:
40
+ import xformers.ops as xops
41
+ except ImportError as e:
42
+ xops = None
43
+
44
+ from .configuration import NewConfig
45
+
46
+
47
+ logger = logging.get_logger(__name__)
48
+
49
+
50
+ # Adapted from https://github.com/HazyResearch/flash-attention/blob/main/flash_attn/bert_padding.py
51
+ # Which was adapted from https://github.com/mlcommons/training_results_v1.1/blob/main/NVIDIA/benchmarks/bert/implementations/pytorch/padding.py
52
+ class IndexFirstAxis(torch.autograd.Function):
53
+ @staticmethod
54
+ def forward(ctx, input, indices):
55
+ ctx.save_for_backward(indices)
56
+ assert input.ndim >= 2
57
+ ctx.first_axis_dim, other_shape = input.shape[0], input.shape[1:]
58
+ second_dim = other_shape.numel()
59
+ # TD [2022-03-04] For some reason torch.gather is a bit faster than indexing.
60
+ # return input[indices]
61
+ # return torch.gather(
62
+ # rearrange(input, "b ... -> b (...)"), 0, repeat(indices, "z -> z d", d=second_dim)
63
+ # ).reshape(-1, *other_shape)
64
+ return torch.gather(
65
+ input.view(ctx.first_axis_dim, second_dim),
66
+ 0,
67
+ indices.unsqueeze(-1).expand(indices.size(0), second_dim)
68
+ ).reshape(-1, *other_shape)
69
+
70
+ @staticmethod
71
+ def backward(ctx, grad_output):
72
+ (indices,) = ctx.saved_tensors
73
+ assert grad_output.ndim >= 2
74
+ other_shape = grad_output.shape[1:]
75
+ # grad_output = rearrange(grad_output, "b ... -> b (...)")
76
+ grad_output = grad_output.view(grad_output.size(0), other_shape.numel())
77
+ grad_input = torch.zeros(
78
+ [ctx.first_axis_dim, grad_output.shape[1]],
79
+ device=grad_output.device,
80
+ dtype=grad_output.dtype,
81
+ )
82
+ # TD [2022-03-04] For some reason torch.scatter is a bit faster than indexing.
83
+ # grad_input[indices] = grad_output
84
+ # grad_input.scatter_(0, repeat(indices, "z -> z d", d=grad_output.shape[1]), grad_output)
85
+ grad_input.scatter_(
86
+ 0, indices.unsqueeze(-1).expand(indices.size(0), grad_output.size(1)), grad_output
87
+ )
88
+ return grad_input.reshape(ctx.first_axis_dim, *other_shape), None
89
+
90
+
91
+ index_first_axis = IndexFirstAxis.apply
92
+
93
+
94
+ def unpad_input(hidden_states, attention_mask=None, indices=None):
95
+ """
96
+ Arguments:
97
+ hidden_states: (batch, seqlen, ...)
98
+ attention_mask: (batch, seqlen), bool / int, 1 means valid and 0 means not valid.
99
+ indices: (total_nnz), the indices of non-masked tokens from the flattened input sequence.
100
+ Return:
101
+ hidden_states: (total_nnz, ...), where total_nnz = number of tokens in selected in attention_mask.
102
+ """
103
+ if indices is None:
104
+ assert attention_mask is not None
105
+ indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
106
+
107
+ # TD [2022-03-04] We don't want to index with a bool mask, because Pytorch will expand the
108
+ # bool mask, then call nonzero to get the indices, then index with those. The indices is @dim
109
+ # times larger than it needs to be, wasting memory. It's faster and more memory-efficient to
110
+ # index with integer indices. Moreover, torch's index is a bit slower than it needs to be,
111
+ # so we write custom forward and backward to make it a bit faster.
112
+ hidden_states = hidden_states.view(-1, *hidden_states.shape[2:])
113
+ return index_first_axis(hidden_states, indices)
114
+
115
+
116
+ class IndexPutFirstAxis(torch.autograd.Function):
117
+ @staticmethod
118
+ def forward(
119
+ ctx,
120
+ values: torch.Tensor,
121
+ indices: torch.Tensor,
122
+ first_axis_dim
123
+ ) -> torch.Tensor:
124
+ ctx.save_for_backward(indices)
125
+ assert indices.ndim == 1
126
+ assert values.ndim >= 2
127
+ output = torch.zeros(
128
+ first_axis_dim, *values.shape[1:], device=values.device, dtype=values.dtype
129
+ )
130
+ output[indices] = values
131
+ return output
132
+
133
+ @staticmethod
134
+ def backward(ctx, grad_output: torch.Tensor) -> Tuple[torch.Tensor, None, None]:
135
+ indices, = ctx.saved_tensors
136
+ grad_values = grad_output[indices]
137
+ return grad_values, None, None
138
+
139
+
140
+ index_put_first_axis = IndexPutFirstAxis.apply
141
+
142
+
143
+ def pad_input(inputs: torch.Tensor, indices: torch.Tensor, batch: int, seqlen: int) -> torch.Tensor:
144
+ """Add padding to sequences.
145
+
146
+ Arguments:
147
+ inputs: (total_nnz, ...), where total_nnz = number of tokens in selected in attention_mask.
148
+ indices: (total_nnz), `indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()`
149
+ batch: int batch_size
150
+ seqlen: int max sequence length
151
+
152
+ Returns:
153
+ inputs: (batch, seqlen, ...)
154
+ """
155
+ output = index_put_first_axis(inputs, indices, batch * seqlen)
156
+ return output.view(batch, seqlen, *inputs.shape[1:])
157
+
158
+
159
+ def rotate_half(x):
160
+ """Rotates half the hidden dims of the input."""
161
+ x1 = x[..., : x.shape[-1] // 2]
162
+ x2 = x[..., x.shape[-1] // 2 :]
163
+ return torch.cat((-x2, x1), dim=-1)
164
+
165
+
166
+ def apply_rotary_pos_emb(q, k, cos, sin):
167
+ """Applies Rotary Position Embedding to the query and key tensors.
168
+
169
+ Args:
170
+ q (`torch.Tensor`): The query tensor.
171
+ k (`torch.Tensor`): The key tensor.
172
+ cos (`torch.Tensor`): The cosine part of the rotary embedding.
173
+ sin (`torch.Tensor`): The sine part of the rotary embedding.
174
+ Returns:
175
+ `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
176
+ """
177
+ cos, sin = cos.to(q.dtype), sin.to(q.dtype)
178
+ q_embed = (q * cos) + (rotate_half(q) * sin)
179
+ k_embed = (k * cos) + (rotate_half(k) * sin)
180
+ return q_embed, k_embed
181
+
182
+
183
+ class RotaryEmbedding(torch.nn.Module):
184
+ def __init__(self, dim, max_position_embeddings=512, base=10000.0, device=None):
185
+ super().__init__()
186
+
187
+ self.dim = dim
188
+ self.max_position_embeddings = max_position_embeddings
189
+ self.base = base
190
+ inv_freq = 1.0 / (self.base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
191
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
192
+
193
+ # Build here to make `torch.jit.trace` work.
194
+ self._set_cos_sin_cache(
195
+ seq_len=max_position_embeddings, device=self.inv_freq.device, dtype=torch.get_default_dtype()
196
+ )
197
+
198
+ def _set_cos_sin_cache(self, seq_len, device, dtype):
199
+ self.max_seq_len_cached = seq_len
200
+ t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.float32)
201
+
202
+ freqs = torch.einsum("i,j->ij", t, self.inv_freq)
203
+ # Different from paper, but it uses a different permutation in order to obtain the same calculation
204
+ emb = torch.cat((freqs, freqs), dim=-1)
205
+ self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
206
+ self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
207
+
208
+ def forward(self, x, seq_len=None):
209
+ # x: [bs, num_attention_heads, seq_len, head_size]
210
+ if seq_len > self.max_seq_len_cached:
211
+ self._set_cos_sin_cache(seq_len=seq_len, device=x.device, dtype=x.dtype)
212
+
213
+ return (
214
+ self.cos_cached[:seq_len, ...].to(dtype=x.dtype),
215
+ self.sin_cached[:seq_len, ...].to(dtype=x.dtype),
216
+ )
217
+
218
+
219
+ class NTKScalingRotaryEmbedding(RotaryEmbedding):
220
+ """RotaryEmbedding extended with fixed and mixed NTK scaling. https://kexue.fm/archives/9706 """
221
+
222
+ def __init__(self, dim, max_position_embeddings=512, base=10000, device=None, scaling_factor=1.0, mixed_b=None):
223
+ self.scaling_factor = scaling_factor
224
+ self.mixed_b = mixed_b
225
+ super().__init__(dim, max_position_embeddings, base, device)
226
+ max_position_embeddings = max_position_embeddings * self.scaling_factor
227
+ self._set_cos_sin_cache(max_position_embeddings, self.inv_freq.device, torch.get_default_dtype())
228
+
229
+ def _set_cos_sin_cache(self, seq_len, device, dtype):
230
+ self.max_seq_len_cached = seq_len
231
+
232
+ if seq_len > self.max_position_embeddings:
233
+ base = self.base * (self.scaling_factor if self.mixed_b is None else 1)
234
+ inv_freq = 1.0 / (base ** (torch.arange(0, self.dim, 2).float().to(device) / self.dim))
235
+
236
+ if self.mixed_b is None:
237
+ inv_freq = inv_freq / self.scaling_factor ** (2 / self.dim) # (6)
238
+ else:
239
+ a = torch.tensor(self.scaling_factor).log() / (self.dim / 2) ** self.mixed_b # (13)
240
+ lambda_1_m = (a * torch.arange(1, self.dim // 2 + 1).float().to(device) ** self.mixed_b).exp() # (12)
241
+ inv_freq = inv_freq / lambda_1_m # (10)
242
+
243
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
244
+
245
+ t = torch.arange(self.max_seq_len_cached, device=device, dtype=torch.float32)
246
+
247
+ freqs = torch.einsum("i,j->ij", t, self.inv_freq)
248
+ # Different from paper, but it uses a different permutation in order to obtain the same calculation
249
+ emb = torch.cat((freqs, freqs), dim=-1)
250
+ self.register_buffer("cos_cached", emb.cos().to(dtype), persistent=False)
251
+ self.register_buffer("sin_cached", emb.sin().to(dtype), persistent=False)
252
+
253
+
254
+ class RMSNorm(nn.Module):
255
+ def __init__(self, hidden_size, eps=1e-6):
256
+ """
257
+ RMSNorm is equivalent to T5LayerNorm
258
+ """
259
+ super().__init__()
260
+ self.weight = nn.Parameter(torch.ones(hidden_size))
261
+ self.variance_epsilon = eps
262
+
263
+ def forward(self, hidden_states):
264
+ input_dtype = hidden_states.dtype
265
+ hidden_states = hidden_states.to(torch.float32)
266
+ variance = hidden_states.pow(2).mean(-1, keepdim=True)
267
+ hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
268
+ return self.weight * hidden_states.to(input_dtype)
269
+
270
+
271
+ LAYER_NORM = {
272
+ 'layer_norm': nn.LayerNorm,
273
+ 'rms_norm': RMSNorm
274
+ }
275
+
276
+
277
+ class NewEmbeddings(nn.Module):
278
+ """
279
+ Embedding and Unpadding.
280
+ """
281
+
282
+ def __init__(self, config: NewConfig):
283
+ super().__init__()
284
+ self.padding_idx = config.pad_token_id
285
+ self.word_embeddings = nn.Embedding(
286
+ config.vocab_size, config.hidden_size, padding_idx=self.padding_idx
287
+ )
288
+
289
+ self.position_embedding_type = config.position_embedding_type
290
+ if self.position_embedding_type == 'absolute':
291
+ self.position_embeddings = nn.Embedding(
292
+ config.max_position_embeddings, config.hidden_size, padding_idx=self.padding_idx
293
+ )
294
+ elif self.position_embedding_type == 'rope':
295
+ self._init_rope(config)
296
+ else:
297
+ raise ValueError
298
+
299
+ self.type_vocab_size = config.type_vocab_size
300
+ if self.type_vocab_size > 0:
301
+ self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
302
+
303
+ # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
304
+ # any TensorFlow checkpoint file
305
+ self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
306
+ self.dropout = nn.Dropout(config.hidden_dropout_prob)
307
+ # position_ids is contiguous in memory and excluded when serialized
308
+ self.register_buffer(
309
+ "position_ids", torch.arange(config.max_position_embeddings), persistent=False
310
+ )
311
+
312
+ def _init_rope(self, config):
313
+ kwargs = dict(
314
+ dim=int(config.hidden_size / config.num_attention_heads),
315
+ max_position_embeddings=config.max_position_embeddings,
316
+ base=config.rope_theta
317
+ )
318
+ if config.rope_scaling is None:
319
+ self.rotary_emb = RotaryEmbedding(**kwargs)
320
+ else:
321
+ kwargs.update(scaling_factor=config.rope_scaling["factor"])
322
+ scaling_type = config.rope_scaling["type"]
323
+ if scaling_type == 'ntk':
324
+ kwargs.update(mixed_b=config.rope_scaling.get('mixed_b', None))
325
+ self.rotary_emb = NTKScalingRotaryEmbedding(**kwargs)
326
+ # elif scaling_type == "linear":
327
+ # self.rotary_emb = LinearScalingRotaryEmbedding(**kwargs)
328
+ # elif scaling_type == "dynamic":
329
+ # self.rotary_emb = DynamicNTKScalingRotaryEmbedding(**kwargs)
330
+ else:
331
+ raise ValueError(f"Unknown RoPE scaling type {scaling_type}")
332
+
333
+ def forward(
334
+ self,
335
+ unpad_inputs: bool,
336
+ input_ids: Optional[torch.Tensor] = None,
337
+ attention_mask: Optional[torch.Tensor] = None,
338
+ length: Optional[List[int]] = None,
339
+ token_type_ids: Optional[torch.Tensor] = None,
340
+ position_ids: Optional[torch.Tensor] = None,
341
+ inputs_embeds: Optional[torch.Tensor] = None,
342
+ ) -> Tuple[torch.Tensor, torch.Tensor, Optional[Tuple], Optional[List[int]]]:
343
+ """
344
+ """
345
+ if inputs_embeds is None:
346
+ device, input_shape = input_ids.device, input_ids.shape
347
+ else:
348
+ device, input_shape = inputs_embeds.device, inputs_embeds.shape[:2]
349
+ batch_size, seq_length = input_shape
350
+
351
+ # Set attention_mask if it's None
352
+ if attention_mask is None:
353
+ attention_mask = torch.ones(input_shape, device=device)
354
+ if length is not None:
355
+ for i, l in enumerate(length):
356
+ attention_mask[i, l:] = 0
357
+
358
+ # Set attention_mask_bool for unpadding
359
+ if unpad_inputs:
360
+ attention_mask_bool = attention_mask.bool()
361
+ if length is None:
362
+ length = attention_mask.sum(-1).tolist()
363
+
364
+ # Get word embeddings
365
+ if inputs_embeds is None:
366
+ if unpad_inputs:
367
+ input_ids = input_ids[attention_mask_bool].unsqueeze(0)
368
+ inputs_embeds = self.word_embeddings(input_ids)
369
+ else:
370
+ if unpad_inputs:
371
+ inputs_embeds = inputs_embeds[attention_mask_bool].unsqueeze(0)
372
+ embeddings = inputs_embeds
373
+
374
+ # Set and unpad position_ids
375
+ if position_ids is None:
376
+ if seq_length > self.position_ids.size(0):
377
+ self.register_buffer(
378
+ "position_ids", torch.arange(seq_length, device=embeddings.device), persistent=False
379
+ )
380
+ if unpad_inputs:
381
+ # [1, cumsum_seq_len]
382
+ position_ids = torch.cat([self.position_ids[:l] for l in length]).unsqueeze(0)
383
+ else:
384
+ # [bs, seq_len]
385
+ position_ids = self.position_ids[:seq_length].expand(batch_size, -1)
386
+ elif unpad_inputs:
387
+ position_ids = position_ids[attention_mask_bool].unsqueeze(0) # [1, cumsum_seq_len]
388
+
389
+ # Compute rotary embedding
390
+ if self.position_embedding_type == 'rope':
391
+ rope_cos, rope_sin = self.rotary_emb(inputs_embeds, seq_len=seq_length)
392
+ rope_cos = rope_cos[position_ids].unsqueeze(2) # [bs, seq_len, 1, dim]
393
+ rope_sin = rope_sin[position_ids].unsqueeze(2) # [bs, seq_len, 1, dim]
394
+ rope_embeds = rope_cos, rope_sin
395
+ else:
396
+ rope_embeds = None
397
+
398
+ if self.type_vocab_size > 0:
399
+ if token_type_ids is None:
400
+ token_type_ids = position_ids.mul(0)
401
+ else:
402
+ if self.type_vocab_size < 2:
403
+ token_type_ids.mul_(0)
404
+ if unpad_inputs:
405
+ token_type_ids = token_type_ids[attention_mask_bool].unsqueeze(0)
406
+
407
+ token_type_embeddings = self.token_type_embeddings(token_type_ids)
408
+ embeddings = embeddings + token_type_embeddings
409
+
410
+ # BERT position
411
+ if self.position_embedding_type == "absolute":
412
+ position_embeddings = self.position_embeddings(position_ids)
413
+ embeddings = embeddings + position_embeddings
414
+
415
+ embeddings = self.LayerNorm(embeddings)
416
+ embeddings = self.dropout(embeddings)
417
+
418
+ return embeddings, attention_mask, rope_embeds, length
419
+
420
+
421
+ class NewAttention(nn.Module):
422
+ def __init__(self, config: NewConfig, pack_qkv=None, use_memory_efficient_attention=None):
423
+ super().__init__()
424
+ self.config = config
425
+ if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
426
+ raise ValueError(
427
+ f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
428
+ f"heads ({config.num_attention_heads})"
429
+ )
430
+
431
+ self.hidden_size = config.hidden_size
432
+ self.num_attention_heads = config.num_attention_heads
433
+ self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
434
+ self.all_head_size = self.num_attention_heads * self.attention_head_size
435
+
436
+ if pack_qkv is None:
437
+ pack_qkv = config.pack_qkv
438
+ self.pack_qkv = pack_qkv
439
+
440
+ if self.pack_qkv:
441
+ self.qkv_proj = nn.Linear(config.hidden_size, self.all_head_size * 3, bias=True)
442
+ else:
443
+ self.q_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
444
+ self.k_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
445
+ self.v_proj = nn.Linear(config.hidden_size, self.all_head_size, bias=True)
446
+
447
+ self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
448
+ self.o_proj = nn.Linear(config.hidden_size, config.hidden_size, bias=True)
449
+
450
+ if use_memory_efficient_attention is None:
451
+ use_memory_efficient_attention = self.config.use_memory_efficient_attention
452
+ self.use_memory_efficient_attention = use_memory_efficient_attention
453
+ self.memory_efficient_attention = None if xops is None else xops.memory_efficient_attention
454
+ if self.use_memory_efficient_attention:
455
+ assert self.memory_efficient_attention is not None, 'please install xformers'
456
+
457
+ def forward(
458
+ self,
459
+ hidden_states: torch.Tensor,
460
+ attention_bias: torch.FloatTensor,
461
+ rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
462
+ padding_inputs: Optional[Tuple] = None, # indices, batch, seqlen
463
+ attention_scale: Optional[torch.FloatTensor] = None,
464
+ head_mask: Optional[torch.FloatTensor] = None,
465
+ output_attentions: Optional[bool] = False,
466
+ qkv_inputs: Optional[Tuple] = None, # For RetroMAE
467
+ ) -> Tuple[torch.Tensor, ...]:
468
+ shape_hd = (self.num_attention_heads, self.attention_head_size)
469
+ # qkv
470
+ if self.pack_qkv and qkv_inputs is None:
471
+ qkv_pack = self.qkv_proj(hidden_states).split(self.all_head_size, dim=-1)
472
+ else:
473
+ if qkv_inputs is None:
474
+ qkv_inputs = (hidden_states, hidden_states, hidden_states)
475
+ qkv_pack = [
476
+ getattr(self, n + '_proj')(s) for s, n in zip(qkv_inputs, 'qkv')
477
+ ]
478
+ query_states, key_states, value_states = [t.view(t.shape[:-1] + shape_hd) for t in qkv_pack]
479
+
480
+ if self.config.position_embedding_type == 'rope':
481
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, *rope_embeds)
482
+
483
+ dtype = query_states.dtype
484
+
485
+ if self.config.logn_attention_scale and attention_scale is not None:
486
+ # https://kexue.fm/archives/8823
487
+ query_states = query_states * attention_scale.to(dtype)
488
+
489
+ if padding_inputs is not None:
490
+ query_states = pad_input(query_states.squeeze(), *padding_inputs)
491
+ key_states = pad_input(key_states.squeeze(), *padding_inputs)
492
+ value_states = pad_input(value_states.squeeze(), *padding_inputs)
493
+
494
+ if self.use_memory_efficient_attention:
495
+ assert self.memory_efficient_attention is not None, "xformers is not loaded"
496
+ assert output_attentions is False, "memory_efficient_attention do not output attentions"
497
+ assert head_mask is None, "Not support yet"
498
+ attention_probs = None
499
+ if torch.is_tensor(attention_bias):
500
+ attention_bias = attention_bias.to(dtype)
501
+ context_layer = self.memory_efficient_attention(
502
+ query_states,
503
+ key_states,
504
+ value_states,
505
+ attn_bias=attention_bias,
506
+ p=self.dropout.p
507
+ )
508
+ else:
509
+ if output_attentions and isinstance(self, NewSdpaAttention):
510
+ raise RuntimeError("SDPA do not output attentions")
511
+ context_layer, attention_probs = self._attention(
512
+ query_states, key_states, value_states, attention_bias, head_mask
513
+ )
514
+
515
+ if padding_inputs is not None:
516
+ context_layer = unpad_input(context_layer, indices=padding_inputs[0])
517
+
518
+ new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
519
+ context_layer = context_layer.view(new_context_layer_shape)
520
+
521
+ # output proj
522
+ attn_output = self.o_proj(context_layer)
523
+
524
+ # add attentions if we output them
525
+ outputs = (attn_output, attention_probs) if output_attentions else (attn_output,)
526
+ return outputs
527
+
528
+ def _attention(self, query_states, key_states, value_states, attention_bias, head_mask):
529
+ """
530
+ Args:
531
+ q/k/v: (B, L, n_head, head_dim),
532
+ Returns:
533
+ attn_output: (B L, n_head, head_dim)
534
+ """
535
+ query_states = query_states.transpose(1, 2)
536
+ key_states = key_states.transpose(1, 2)
537
+ value_states = value_states.transpose(1, 2)
538
+ # Take the dot product between "query" and "key" to get the raw attention scores.
539
+ attention_scores = torch.matmul(query_states, key_states.transpose(-1, -2))
540
+
541
+ attention_scores = attention_scores / math.sqrt(self.attention_head_size)
542
+ if attention_bias is not None:
543
+ # Apply the attention mask is (precomputed for all layers in BertModel forward() function)
544
+ attention_scores = attention_scores + attention_bias
545
+
546
+ # Normalize the attention scores to probabilities.
547
+ attention_probs = nn.functional.softmax(attention_scores, dim=-1)
548
+
549
+ # This is actually dropping out entire tokens to attend to, which might
550
+ # seem a bit unusual, but is taken from the original Transformer paper.
551
+ if self.dropout.p > 0:
552
+ attention_probs = self.dropout(attention_probs)
553
+
554
+ # Mask heads if we want to
555
+ if head_mask is not None:
556
+ attention_probs = attention_probs * head_mask
557
+
558
+ context_layer = torch.matmul(attention_probs, value_states)
559
+
560
+ context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
561
+ return context_layer, attention_probs
562
+
563
+
564
+ class NewSdpaAttention(NewAttention):
565
+ """
566
+ New attention module using torch.nn.functional.scaled_dot_product_attention. This module inherits from
567
+ `NewAttention` as the weights of the module stays untouched. The only changes are on the forward pass to adapt to
568
+ SDPA API.
569
+ """
570
+ def __init__(self, config: NewConfig, **kwargs):
571
+ super().__init__(config, **kwargs)
572
+ # torch.backends.cuda.enable_mem_efficient_sdp(False)
573
+ # logger.warning(
574
+ # "Disable memory efficient attention kernel for `NewSdpaAttention`, you can set "
575
+ # "`use_memory_efficient_attention=True` if it expected to use."
576
+ # )
577
+
578
+ def _attention(self, query_states, key_states, value_states, attention_bias, head_mask):
579
+ attn_output = torch.nn.functional.scaled_dot_product_attention(
580
+ query_states.transpose(1, 2),
581
+ key_states.transpose(1, 2),
582
+ value_states.transpose(1, 2),
583
+ attn_mask=attention_bias,
584
+ dropout_p=self.dropout.p if self.training else 0.0,
585
+ )
586
+ attn_output = attn_output.permute(0, 2, 1, 3).contiguous()
587
+ return attn_output, None
588
+
589
+
590
+ NEW_ATTENTION_CLASSES = {
591
+ "eager": NewAttention,
592
+ # "flash_attention_2": , # TODO
593
+ "sdpa": NewSdpaAttention,
594
+ }
595
+
596
+
597
+ class NewGatedMLP(nn.Module):
598
+ """
599
+ GLU Variants Improve Transformer.
600
+ """
601
+
602
+ def __init__(self, config: NewConfig):
603
+ super().__init__()
604
+ self.intermediate_size = config.intermediate_size
605
+ self.up_gate_proj = nn.Linear(config.hidden_size, self.intermediate_size * 2, bias=False)
606
+ self.down_proj = nn.Linear(self.intermediate_size, config.hidden_size, bias=True)
607
+ self.act_fn = ACT2FN[config.hidden_act]
608
+ if config.hidden_dropout_prob > 0:
609
+ self.hidden_dropout = nn.Dropout(config.hidden_dropout_prob)
610
+ else:
611
+ self.hidden_dropout = None
612
+
613
+ def forward(self, hidden_states):
614
+ up_gate = self.up_gate_proj(hidden_states)
615
+ up_states, gate = torch.split(up_gate, self.intermediate_size, dim=-1)
616
+ gate = self.act_fn(gate)
617
+ gated_states = gate * up_states
618
+ if self.hidden_dropout is not None:
619
+ gated_states = self.hidden_dropout(gated_states)
620
+ down_states = self.down_proj(gated_states)
621
+ return down_states
622
+
623
+
624
+ class NewLayer(nn.Module):
625
+ def __init__(
626
+ self,
627
+ config: NewConfig,
628
+ pack_qkv=None,
629
+ use_memory_efficient_attention=None,
630
+ attn_implementation=None
631
+ ):
632
+ super().__init__()
633
+ if attn_implementation is None:
634
+ attn_implementation = config._attn_implementation
635
+ if use_memory_efficient_attention is None:
636
+ use_memory_efficient_attention = config.use_memory_efficient_attention
637
+ if use_memory_efficient_attention:
638
+ if attn_implementation != 'eager':
639
+ logger.warning_once(f"Override {attn_implementation=} to 'eager' as {use_memory_efficient_attention=}")
640
+ attn_implementation = 'eager' # Since it will be SDPA by default for torch>=2.1.1
641
+ self.attention = NEW_ATTENTION_CLASSES[attn_implementation](
642
+ config, pack_qkv=pack_qkv, use_memory_efficient_attention=use_memory_efficient_attention
643
+ )
644
+ self.mlp = NewGatedMLP(config)
645
+
646
+ ln_class = LAYER_NORM[config.layer_norm_type]
647
+ self.attn_ln = ln_class(config.hidden_size, eps=config.layer_norm_eps)
648
+ self.mlp_ln = ln_class(config.hidden_size, eps=config.layer_norm_eps)
649
+
650
+ if config.hidden_dropout_prob > 0:
651
+ self.hidden_dropout = nn.Dropout(config.hidden_dropout_prob)
652
+ else:
653
+ self.hidden_dropout = None
654
+
655
+ def forward(
656
+ self,
657
+ hidden_states: torch.Tensor,
658
+ attention_bias: torch.FloatTensor,
659
+ rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
660
+ padding_inputs: Optional[Tuple] = None, # indices, batch, seqlen
661
+ attention_scale: Optional[torch.FloatTensor] = None,
662
+ subset_indices: Optional[torch.LongTensor] = None,
663
+ head_mask: Optional[torch.FloatTensor] = None,
664
+ output_attentions: Optional[bool] = False,
665
+ qkv_inputs: Optional[Tuple] = None, # For RetroMAE
666
+ ) -> Tuple[torch.Tensor, ...]:
667
+ # Multi head self attention
668
+ residual = hidden_states if qkv_inputs is None else qkv_inputs[0]
669
+ attention_outputs = self.attention(
670
+ hidden_states,
671
+ attention_bias,
672
+ rope_embeds,
673
+ padding_inputs,
674
+ attention_scale,
675
+ head_mask,
676
+ output_attentions=output_attentions,
677
+ qkv_inputs=qkv_inputs,
678
+ )
679
+ hidden_states = attention_outputs[0]
680
+ if self.hidden_dropout is not None:
681
+ hidden_states = self.hidden_dropout(hidden_states)
682
+ hidden_states = residual + hidden_states
683
+
684
+ # In pretraining, after the attention of last layer, we only need the masked tokens.
685
+ if subset_indices is not None:
686
+ hidden_states = hidden_states[subset_indices]
687
+
688
+ hidden_states = self.attn_ln(hidden_states)
689
+
690
+ # Fully Connected
691
+ residual = hidden_states
692
+ hidden_states = self.mlp(hidden_states)
693
+ if self.hidden_dropout is not None:
694
+ hidden_states = self.hidden_dropout(hidden_states)
695
+ hidden_states = residual + hidden_states
696
+ hidden_states = self.mlp_ln(hidden_states)
697
+
698
+ # add self attentions if we output attention weights
699
+ outputs = (hidden_states,) + attention_outputs[1:]
700
+ return outputs
701
+
702
+
703
+ class NewEncoder(nn.Module):
704
+ def __init__(self, config):
705
+ super().__init__()
706
+ self.config = config
707
+ self.layer = nn.ModuleList([NewLayer(config) for _ in range(config.num_hidden_layers)])
708
+ self.gradient_checkpointing = False
709
+
710
+ def forward(
711
+ self,
712
+ hidden_states: torch.Tensor,
713
+ attention_bias: Optional[torch.FloatTensor] = None,
714
+ rope_embeds: Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] = None,
715
+ padding_inputs: Optional[Tuple] = None, # indices, batch, seqlen
716
+ attention_scale: Optional[torch.FloatTensor] = None,
717
+ subset_indices: Optional[torch.LongTensor] = None,
718
+ head_mask: Optional[torch.FloatTensor] = None,
719
+ output_attentions: Optional[bool] = False,
720
+ output_hidden_states: Optional[bool] = False,
721
+ return_dict: Optional[bool] = True,
722
+ ) -> Union[Tuple[torch.Tensor], BaseModelOutput]:
723
+ all_hidden_states = () if output_hidden_states else None
724
+ all_self_attentions = () if output_attentions else None
725
+
726
+ for i, layer_module in enumerate(self.layer):
727
+ if output_hidden_states:
728
+ all_hidden_states = all_hidden_states + (hidden_states,)
729
+
730
+ if i >= len(self.layer) - 1:
731
+ layer_subset_indices = subset_indices
732
+ else:
733
+ layer_subset_indices = None
734
+
735
+ layer_head_mask = head_mask[i] if head_mask is not None else None
736
+
737
+ if self.gradient_checkpointing and self.training:
738
+ layer_outputs = self._gradient_checkpointing_func(
739
+ layer_module.__call__,
740
+ hidden_states,
741
+ attention_bias,
742
+ rope_embeds,
743
+ padding_inputs,
744
+ attention_scale,
745
+ layer_subset_indices,
746
+ layer_head_mask,
747
+ )
748
+ else:
749
+ layer_outputs = layer_module(
750
+ hidden_states,
751
+ attention_bias,
752
+ rope_embeds,
753
+ padding_inputs,
754
+ attention_scale,
755
+ layer_subset_indices,
756
+ layer_head_mask,
757
+ output_attentions,
758
+ )
759
+
760
+ hidden_states = layer_outputs[0]
761
+ if output_attentions:
762
+ all_self_attentions = all_self_attentions + (layer_outputs[1],)
763
+
764
+ if output_hidden_states:
765
+ all_hidden_states = all_hidden_states + (hidden_states,)
766
+
767
+ if not return_dict:
768
+ return tuple(
769
+ v
770
+ for v in [
771
+ hidden_states,
772
+ all_hidden_states,
773
+ all_self_attentions,
774
+ ]
775
+ if v is not None
776
+ )
777
+ return BaseModelOutput(
778
+ last_hidden_state=hidden_states,
779
+ hidden_states=all_hidden_states,
780
+ attentions=all_self_attentions,
781
+ )
782
+
783
+
784
+ # Copied from transformers.models.bert.modeling_bert.BertPooler with Bert->New
785
+ class NewPooler(nn.Module):
786
+ def __init__(self, config):
787
+ super().__init__()
788
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
789
+ self.activation = nn.Tanh()
790
+
791
+ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
792
+ # We "pool" the model by simply taking the hidden state corresponding
793
+ # to the first token.
794
+ first_token_tensor = hidden_states[:, 0]
795
+ pooled_output = self.dense(first_token_tensor)
796
+ pooled_output = self.activation(pooled_output)
797
+ return pooled_output
798
+
799
+
800
+ class NewPreTrainedModel(PreTrainedModel):
801
+ """
802
+ An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
803
+ models.
804
+ """
805
+
806
+ config_class = NewConfig
807
+ base_model_prefix = "new"
808
+ supports_gradient_checkpointing = True
809
+ _supports_sdpa = True
810
+
811
+ def _init_weights(self, module):
812
+ """Initialize the weights"""
813
+ if isinstance(module, nn.Linear):
814
+ # Slightly different from the TF version which uses truncated_normal for initialization
815
+ # cf https://github.com/pytorch/pytorch/pull/5617
816
+ module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
817
+ if module.bias is not None:
818
+ module.bias.data.zero_()
819
+ elif isinstance(module, nn.Embedding):
820
+ module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
821
+ if module.padding_idx is not None:
822
+ module.weight.data[module.padding_idx].zero_()
823
+ elif isinstance(module, nn.LayerNorm):
824
+ module.bias.data.zero_()
825
+ module.weight.data.fill_(1.0)
826
+
827
+
828
+ class NewModel(NewPreTrainedModel):
829
+ """
830
+ The bare New Model transformer outputting raw hidden-states without any specific head on top.
831
+ """
832
+
833
+ def __init__(self, config: NewConfig, add_pooling_layer=False):
834
+ super().__init__(config)
835
+ self.config = config
836
+
837
+ self.embeddings = NewEmbeddings(config)
838
+ self.encoder = NewEncoder(config)
839
+
840
+ self.pooler = NewPooler(config) if add_pooling_layer else None
841
+
842
+ # Initialize weights and apply final processing
843
+ self.post_init()
844
+
845
+ def get_input_embeddings(self):
846
+ return self.embeddings.word_embeddings
847
+
848
+ def set_input_embeddings(self, value):
849
+ self.embeddings.word_embeddings = value
850
+
851
+ def forward(
852
+ self,
853
+ input_ids: Optional[torch.Tensor] = None,
854
+ attention_mask: Optional[torch.Tensor] = None,
855
+ length: Optional[List[int]] = None,
856
+ subset_indices: Optional[torch.LongTensor] = None,
857
+ token_type_ids: Optional[torch.Tensor] = None,
858
+ position_ids: Optional[torch.Tensor] = None,
859
+ head_mask: Optional[torch.Tensor] = None,
860
+ inputs_embeds: Optional[torch.Tensor] = None,
861
+ output_attentions: Optional[bool] = None,
862
+ output_hidden_states: Optional[bool] = None,
863
+ return_dict: Optional[bool] = None,
864
+ unpad_inputs: Optional[bool] = None,
865
+ ) -> Union[Tuple[torch.Tensor], BaseModelOutputWithPooling]:
866
+ r"""
867
+ length (`list` of length `batch_size`, *optional*):
868
+ If is `None`, return padded `last_hidden_state`.
869
+ subset_indices ():
870
+ pass
871
+ unpad_inputs (`bool`, *optional*):
872
+ pass
873
+ """
874
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
875
+ output_hidden_states = (
876
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
877
+ )
878
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
879
+ unpad_inputs = unpad_inputs if unpad_inputs is not None else self.config.unpad_inputs
880
+ output_padded = length is None
881
+
882
+ if input_ids is not None and inputs_embeds is not None:
883
+ raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
884
+ elif input_ids is not None:
885
+ self.warn_if_padding_and_no_attention_mask(input_ids, attention_mask)
886
+ input_shape = input_ids.size()
887
+ elif inputs_embeds is not None:
888
+ input_shape = inputs_embeds.size()[:-1]
889
+ else:
890
+ raise ValueError("You have to specify either input_ids or inputs_embeds")
891
+
892
+ # TODO: not used
893
+ # # Prepare head mask if needed
894
+ # # 1.0 in head_mask indicate we keep the head
895
+ # # attention_probs has shape bsz x n_heads x N x N
896
+ # # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
897
+ # # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
898
+ # head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
899
+
900
+ # Get embeddings, may unpad them
901
+ (embedding_output, attention_mask, rope_embeds, length) = self.embeddings(
902
+ unpad_inputs,
903
+ input_ids=input_ids,
904
+ attention_mask=attention_mask,
905
+ length=length,
906
+ token_type_ids=token_type_ids,
907
+ position_ids=position_ids,
908
+ inputs_embeds=inputs_embeds
909
+ )
910
+
911
+ batch_size, seq_length = input_shape
912
+ if unpad_inputs and self.config.use_memory_efficient_attention:
913
+ attention_bias = xops.fmha.attn_bias.BlockDiagonalMask.from_seqlens(length)
914
+ else:
915
+ # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
916
+ # ourselves in which case we just need to make it broadcastable to all heads.
917
+ attention_bias = self.get_extended_attention_mask(attention_mask, input_shape)
918
+ if self.config.use_memory_efficient_attention:
919
+ # Invalid shape for attention bias: torch.Size([48, 1, 1, 512]) (expected (48, 12, 512, 512))
920
+ attention_bias = attention_bias.expand(-1, self.config.num_attention_heads, seq_length, -1)
921
+
922
+ padding_inputs = None
923
+ if unpad_inputs and (output_padded or not self.config.use_memory_efficient_attention):
924
+ indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
925
+ if not self.config.use_memory_efficient_attention:
926
+ padding_inputs = (indices, *input_shape)
927
+
928
+ attention_scale = None
929
+ if self.config.logn_attention_scale:
930
+ logger.warning_once("TODO: logn_attention_scale")
931
+ # # attention scale log_512(input_len)
932
+ # attention_scale = attention_mask.sum(1).log() / torch.tensor(self.config.max_position_embeddings).log()
933
+ # # inference-time logn scale need clip 1
934
+ # if self.config.logn_attention_clip1:
935
+ # attention_scale.clip_(1)
936
+ # attention_scale = attention_scale[:, None, None, None]
937
+ # else:
938
+ # attention_scale = None
939
+
940
+ encoder_outputs = self.encoder(
941
+ embedding_output,
942
+ attention_bias=attention_bias,
943
+ rope_embeds=rope_embeds,
944
+ padding_inputs=padding_inputs,
945
+ attention_scale=attention_scale,
946
+ subset_indices=subset_indices,
947
+ head_mask=head_mask,
948
+ output_attentions=output_attentions,
949
+ output_hidden_states=output_hidden_states,
950
+ return_dict=return_dict,
951
+ )
952
+ sequence_output = encoder_outputs[0]
953
+ if unpad_inputs and output_padded:
954
+ sequence_output = pad_input(
955
+ sequence_output.squeeze(), indices, batch_size, seq_length
956
+ )
957
+
958
+ pooled_output = self.pooler(sequence_output) if self.pooler is not None else None
959
+
960
+ if not return_dict:
961
+ return (sequence_output, pooled_output) + encoder_outputs[1:]
962
+
963
+ return BaseModelOutputWithPooling(
964
+ last_hidden_state=sequence_output,
965
+ pooler_output=pooled_output,
966
+ hidden_states=encoder_outputs.hidden_states,
967
+ attentions=encoder_outputs.attentions,
968
+ )
969
+
970
+
971
+ class NewLMPredictionHead(nn.Module):
972
+ def __init__(self, config):
973
+ super().__init__()
974
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
975
+ self.transform_act_fn = ACT2FN[config.hidden_act]
976
+ self.norm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
977
+
978
+ # The output weights are the same as the input embeddings, but there is
979
+ # an output-only bias for each token.
980
+ self.decoder = nn.Linear(config.hidden_size, config.vocab_size)
981
+
982
+ def forward(self, hidden_states):
983
+ hidden_states = self.dense(hidden_states)
984
+ hidden_states = self.transform_act_fn(hidden_states)
985
+ hidden_states = self.norm(hidden_states)
986
+ hidden_states = self.decoder(hidden_states)
987
+ return hidden_states
988
+
989
+
990
+ class NewForMaskedLM(NewPreTrainedModel):
991
+ _tied_weights_keys = ["lm_head.decoder.bias", "lm_head.decoder.weight"]
992
+
993
+ def __init__(self, config: NewConfig):
994
+ super().__init__(config)
995
+ self.new = NewModel(config, add_pooling_layer=False)
996
+ self.lm_head = NewLMPredictionHead(config)
997
+ self.loss_fct = nn.CrossEntropyLoss()
998
+
999
+ # Initialize weights and apply final processing
1000
+ self.post_init()
1001
+
1002
+ def get_output_embeddings(self):
1003
+ return self.lm_head.decoder
1004
+
1005
+ def set_output_embeddings(self, new_embeddings):
1006
+ self.lm_head.decoder = new_embeddings
1007
+
1008
+ def forward(
1009
+ self,
1010
+ input_ids: Optional[torch.Tensor] = None,
1011
+ attention_mask: Optional[torch.Tensor] = None,
1012
+ token_type_ids: Optional[torch.Tensor] = None,
1013
+ position_ids: Optional[torch.Tensor] = None,
1014
+ head_mask: Optional[torch.Tensor] = None,
1015
+ inputs_embeds: Optional[torch.Tensor] = None,
1016
+ labels: Optional[torch.Tensor] = None,
1017
+ output_attentions: Optional[bool] = None,
1018
+ output_hidden_states: Optional[bool] = None,
1019
+ return_dict: Optional[bool] = None,
1020
+ unpad_inputs: Optional[bool] = None,
1021
+ ) -> Union[Tuple[torch.Tensor], MaskedLMOutput]:
1022
+ r"""
1023
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
1024
+ Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ...,
1025
+ config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the
1026
+ loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`
1027
+ """
1028
+
1029
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1030
+
1031
+ if labels is None or not self.new.config.unpad_inputs:
1032
+ length = None
1033
+ subset_indices = None
1034
+ else:
1035
+ length = attention_mask.sum(-1).tolist()
1036
+ labels = labels[attention_mask.bool()].unsqueeze(0)
1037
+ subset_indices = labels > -100
1038
+
1039
+ outputs = self.new(
1040
+ input_ids,
1041
+ attention_mask=attention_mask,
1042
+ length=length,
1043
+ subset_indices=subset_indices,
1044
+ token_type_ids=token_type_ids,
1045
+ position_ids=position_ids,
1046
+ head_mask=head_mask,
1047
+ inputs_embeds=inputs_embeds,
1048
+ output_attentions=output_attentions,
1049
+ output_hidden_states=output_hidden_states,
1050
+ return_dict=return_dict,
1051
+ unpad_inputs=unpad_inputs,
1052
+ )
1053
+
1054
+ sequence_output = outputs[0]
1055
+ prediction_scores = self.lm_head(sequence_output)
1056
+
1057
+ masked_lm_loss = None
1058
+ if labels is not None:
1059
+ if subset_indices is None:
1060
+ mask = attention_mask.bool()
1061
+ prediction_scores = prediction_scores[mask]
1062
+ labels = labels[mask]
1063
+ else:
1064
+ labels = labels[subset_indices]
1065
+ masked_lm_loss = self.loss_fct(prediction_scores, labels)
1066
+
1067
+ if not return_dict:
1068
+ output = (prediction_scores,) + outputs[2:]
1069
+ return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output
1070
+
1071
+ return MaskedLMOutput(
1072
+ loss=masked_lm_loss,
1073
+ logits=prediction_scores,
1074
+ hidden_states=outputs.hidden_states,
1075
+ attentions=outputs.attentions,
1076
+ )
1077
+
1078
+
1079
+ class NewForSequenceClassification(NewPreTrainedModel):
1080
+ def __init__(self, config):
1081
+ super().__init__(config)
1082
+ self.num_labels = config.num_labels
1083
+ self.config = config
1084
+
1085
+ self.new = NewModel(config, add_pooling_layer=True)
1086
+ classifier_dropout = (
1087
+ config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
1088
+ )
1089
+ self.dropout = nn.Dropout(classifier_dropout)
1090
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
1091
+
1092
+ # Initialize weights and apply final processing
1093
+ self.post_init()
1094
+
1095
+ def forward(
1096
+ self,
1097
+ input_ids: Optional[torch.Tensor] = None,
1098
+ attention_mask: Optional[torch.Tensor] = None,
1099
+ token_type_ids: Optional[torch.Tensor] = None,
1100
+ position_ids: Optional[torch.Tensor] = None,
1101
+ head_mask: Optional[torch.Tensor] = None,
1102
+ inputs_embeds: Optional[torch.Tensor] = None,
1103
+ labels: Optional[torch.Tensor] = None,
1104
+ output_attentions: Optional[bool] = None,
1105
+ output_hidden_states: Optional[bool] = None,
1106
+ return_dict: Optional[bool] = None,
1107
+ unpad_inputs: Optional[bool] = None,
1108
+ ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutput]:
1109
+ r"""
1110
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1111
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
1112
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
1113
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
1114
+ """
1115
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1116
+
1117
+ outputs = self.new(
1118
+ input_ids,
1119
+ attention_mask=attention_mask,
1120
+ token_type_ids=token_type_ids,
1121
+ position_ids=position_ids,
1122
+ head_mask=head_mask,
1123
+ inputs_embeds=inputs_embeds,
1124
+ output_attentions=output_attentions,
1125
+ output_hidden_states=output_hidden_states,
1126
+ return_dict=return_dict,
1127
+ unpad_inputs=unpad_inputs,
1128
+ )
1129
+
1130
+ pooled_output = outputs[1]
1131
+
1132
+ pooled_output = self.dropout(pooled_output)
1133
+ logits = self.classifier(pooled_output)
1134
+
1135
+ loss = None
1136
+ if labels is not None:
1137
+ if self.config.problem_type is None:
1138
+ if self.num_labels == 1:
1139
+ self.config.problem_type = "regression"
1140
+ elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
1141
+ self.config.problem_type = "single_label_classification"
1142
+ else:
1143
+ self.config.problem_type = "multi_label_classification"
1144
+
1145
+ if self.config.problem_type == "regression":
1146
+ loss_fct = nn.MSELoss()
1147
+ if self.num_labels == 1:
1148
+ loss = loss_fct(logits.squeeze(), labels.squeeze())
1149
+ else:
1150
+ loss = loss_fct(logits, labels)
1151
+ elif self.config.problem_type == "single_label_classification":
1152
+ loss_fct = nn.CrossEntropyLoss()
1153
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
1154
+ elif self.config.problem_type == "multi_label_classification":
1155
+ loss_fct = nn.BCEWithLogitsLoss()
1156
+ loss = loss_fct(logits, labels)
1157
+
1158
+ if not return_dict:
1159
+ output = (logits,) + outputs[2:]
1160
+ return ((loss,) + output) if loss is not None else output
1161
+
1162
+ return SequenceClassifierOutput(
1163
+ loss=loss,
1164
+ logits=logits,
1165
+ hidden_states=outputs.hidden_states,
1166
+ attentions=outputs.attentions,
1167
+ )
1168
+
1169
+
1170
+ class NewForMultipleChoice(NewPreTrainedModel):
1171
+ def __init__(self, config):
1172
+ super().__init__(config)
1173
+
1174
+ self.new = NewModel(config, add_pooling_layer=True)
1175
+ classifier_dropout = (
1176
+ config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
1177
+ )
1178
+ self.dropout = nn.Dropout(classifier_dropout)
1179
+ self.classifier = nn.Linear(config.hidden_size, 1)
1180
+
1181
+ # Initialize weights and apply final processing
1182
+ self.post_init()
1183
+
1184
+ def forward(
1185
+ self,
1186
+ input_ids: Optional[torch.Tensor] = None,
1187
+ attention_mask: Optional[torch.Tensor] = None,
1188
+ token_type_ids: Optional[torch.Tensor] = None,
1189
+ position_ids: Optional[torch.Tensor] = None,
1190
+ head_mask: Optional[torch.Tensor] = None,
1191
+ inputs_embeds: Optional[torch.Tensor] = None,
1192
+ labels: Optional[torch.Tensor] = None,
1193
+ output_attentions: Optional[bool] = None,
1194
+ output_hidden_states: Optional[bool] = None,
1195
+ return_dict: Optional[bool] = None,
1196
+ unpad_inputs: Optional[bool] = None,
1197
+ ) -> Union[Tuple[torch.Tensor], MultipleChoiceModelOutput]:
1198
+ r"""
1199
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1200
+ Labels for computing the multiple choice classification loss. Indices should be in `[0, ...,
1201
+ num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See
1202
+ `input_ids` above)
1203
+ """
1204
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1205
+ num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
1206
+
1207
+ input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
1208
+ attention_mask = attention_mask.view(-1, attention_mask.size(-1)) if attention_mask is not None else None
1209
+ token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) if token_type_ids is not None else None
1210
+ position_ids = position_ids.view(-1, position_ids.size(-1)) if position_ids is not None else None
1211
+ inputs_embeds = (
1212
+ inputs_embeds.view(-1, inputs_embeds.size(-2), inputs_embeds.size(-1))
1213
+ if inputs_embeds is not None
1214
+ else None
1215
+ )
1216
+
1217
+ outputs = self.new(
1218
+ input_ids,
1219
+ attention_mask=attention_mask,
1220
+ token_type_ids=token_type_ids,
1221
+ position_ids=position_ids,
1222
+ head_mask=head_mask,
1223
+ inputs_embeds=inputs_embeds,
1224
+ output_attentions=output_attentions,
1225
+ output_hidden_states=output_hidden_states,
1226
+ return_dict=return_dict,
1227
+ unpad_inputs=unpad_inputs,
1228
+ )
1229
+
1230
+ pooled_output = outputs[1]
1231
+
1232
+ pooled_output = self.dropout(pooled_output)
1233
+ logits = self.classifier(pooled_output)
1234
+ reshaped_logits = logits.view(-1, num_choices)
1235
+
1236
+ loss = None
1237
+ if labels is not None:
1238
+ loss_fct = nn.CrossEntropyLoss()
1239
+ loss = loss_fct(reshaped_logits, labels)
1240
+
1241
+ if not return_dict:
1242
+ output = (reshaped_logits,) + outputs[2:]
1243
+ return ((loss,) + output) if loss is not None else output
1244
+
1245
+ return MultipleChoiceModelOutput(
1246
+ loss=loss,
1247
+ logits=reshaped_logits,
1248
+ hidden_states=outputs.hidden_states,
1249
+ attentions=outputs.attentions,
1250
+ )
1251
+
1252
+
1253
+ @dataclass
1254
+ class NewTokenClassifierOutput(ModelOutput):
1255
+ loss: Optional[torch.FloatTensor] = None
1256
+ logits: torch.FloatTensor = None
1257
+ last_hidden_state: torch.FloatTensor = None
1258
+ hidden_states: Optional[Tuple[torch.FloatTensor, ...]] = None
1259
+ attentions: Optional[Tuple[torch.FloatTensor, ...]] = None
1260
+
1261
+
1262
+ class NewForTokenClassification(NewPreTrainedModel):
1263
+ def __init__(self, config):
1264
+ super().__init__(config)
1265
+ self.num_labels = config.num_labels
1266
+
1267
+ self.new = NewModel(config, add_pooling_layer=False)
1268
+ classifier_dropout = (
1269
+ config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
1270
+ )
1271
+ self.dropout = nn.Dropout(classifier_dropout)
1272
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
1273
+
1274
+ # Initialize weights and apply final processing
1275
+ self.post_init()
1276
+
1277
+ def forward(
1278
+ self,
1279
+ input_ids: Optional[torch.Tensor] = None,
1280
+ attention_mask: Optional[torch.Tensor] = None,
1281
+ token_type_ids: Optional[torch.Tensor] = None,
1282
+ position_ids: Optional[torch.Tensor] = None,
1283
+ head_mask: Optional[torch.Tensor] = None,
1284
+ inputs_embeds: Optional[torch.Tensor] = None,
1285
+ labels: Optional[torch.Tensor] = None,
1286
+ output_attentions: Optional[bool] = None,
1287
+ output_hidden_states: Optional[bool] = None,
1288
+ return_dict: Optional[bool] = None,
1289
+ unpad_inputs: Optional[bool] = None,
1290
+ ) -> Union[Tuple[torch.Tensor], NewTokenClassifierOutput]:
1291
+ r"""
1292
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
1293
+ Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
1294
+ """
1295
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1296
+
1297
+ outputs = self.new(
1298
+ input_ids,
1299
+ attention_mask=attention_mask,
1300
+ token_type_ids=token_type_ids,
1301
+ position_ids=position_ids,
1302
+ head_mask=head_mask,
1303
+ inputs_embeds=inputs_embeds,
1304
+ output_attentions=output_attentions,
1305
+ output_hidden_states=output_hidden_states,
1306
+ return_dict=return_dict,
1307
+ unpad_inputs=unpad_inputs,
1308
+ )
1309
+
1310
+ sequence_output = outputs[0]
1311
+
1312
+ sequence_output = self.dropout(sequence_output)
1313
+ logits = self.classifier(sequence_output)
1314
+
1315
+ loss = None
1316
+ if labels is not None:
1317
+ loss_fct = nn.CrossEntropyLoss()
1318
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
1319
+
1320
+ if not return_dict:
1321
+ output = (logits,) + outputs[2:]
1322
+ return ((loss,) + output) if loss is not None else output
1323
+
1324
+ return NewTokenClassifierOutput(
1325
+ loss=loss,
1326
+ logits=logits,
1327
+ last_hidden_state=sequence_output,
1328
+ hidden_states=outputs.hidden_states,
1329
+ attentions=outputs.attentions,
1330
+ )
1331
+
1332
+
1333
+ class NewForQuestionAnswering(NewPreTrainedModel):
1334
+ def __init__(self, config):
1335
+ super().__init__(config)
1336
+ self.num_labels = config.num_labels
1337
+
1338
+ self.new = NewModel(config, add_pooling_layer=False)
1339
+ self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)
1340
+
1341
+ # Initialize weights and apply final processing
1342
+ self.post_init()
1343
+
1344
+ def forward(
1345
+ self,
1346
+ input_ids: Optional[torch.Tensor] = None,
1347
+ attention_mask: Optional[torch.Tensor] = None,
1348
+ token_type_ids: Optional[torch.Tensor] = None,
1349
+ position_ids: Optional[torch.Tensor] = None,
1350
+ head_mask: Optional[torch.Tensor] = None,
1351
+ inputs_embeds: Optional[torch.Tensor] = None,
1352
+ start_positions: Optional[torch.Tensor] = None,
1353
+ end_positions: Optional[torch.Tensor] = None,
1354
+ output_attentions: Optional[bool] = None,
1355
+ output_hidden_states: Optional[bool] = None,
1356
+ return_dict: Optional[bool] = None,
1357
+ unpad_inputs: Optional[bool] = None,
1358
+ ) -> Union[Tuple[torch.Tensor], QuestionAnsweringModelOutput]:
1359
+ r"""
1360
+ start_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1361
+ Labels for position (index) of the start of the labelled span for computing the token classification loss.
1362
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
1363
+ are not taken into account for computing the loss.
1364
+ end_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1365
+ Labels for position (index) of the end of the labelled span for computing the token classification loss.
1366
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
1367
+ are not taken into account for computing the loss.
1368
+ """
1369
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1370
+
1371
+ outputs = self.new(
1372
+ input_ids,
1373
+ attention_mask=attention_mask,
1374
+ token_type_ids=token_type_ids,
1375
+ position_ids=position_ids,
1376
+ head_mask=head_mask,
1377
+ inputs_embeds=inputs_embeds,
1378
+ output_attentions=output_attentions,
1379
+ output_hidden_states=output_hidden_states,
1380
+ return_dict=return_dict,
1381
+ unpad_inputs=unpad_inputs,
1382
+ )
1383
+
1384
+ sequence_output = outputs[0]
1385
+
1386
+ logits = self.qa_outputs(sequence_output)
1387
+ start_logits, end_logits = logits.split(1, dim=-1)
1388
+ start_logits = start_logits.squeeze(-1).contiguous()
1389
+ end_logits = end_logits.squeeze(-1).contiguous()
1390
+
1391
+ total_loss = None
1392
+ if start_positions is not None and end_positions is not None:
1393
+ # If we are on multi-GPU, split add a dimension
1394
+ if len(start_positions.size()) > 1:
1395
+ start_positions = start_positions.squeeze(-1)
1396
+ if len(end_positions.size()) > 1:
1397
+ end_positions = end_positions.squeeze(-1)
1398
+ # sometimes the start/end positions are outside our model inputs, we ignore these terms
1399
+ ignored_index = start_logits.size(1)
1400
+ start_positions = start_positions.clamp(0, ignored_index)
1401
+ end_positions = end_positions.clamp(0, ignored_index)
1402
+
1403
+ loss_fct = nn.CrossEntropyLoss(ignore_index=ignored_index)
1404
+ start_loss = loss_fct(start_logits, start_positions)
1405
+ end_loss = loss_fct(end_logits, end_positions)
1406
+ total_loss = (start_loss + end_loss) / 2
1407
+
1408
+ if not return_dict:
1409
+ output = (start_logits, end_logits) + outputs[2:]
1410
+ return ((total_loss,) + output) if total_loss is not None else output
1411
+
1412
+ return QuestionAnsweringModelOutput(
1413
+ loss=total_loss,
1414
+ start_logits=start_logits,
1415
+ end_logits=end_logits,
1416
+ hidden_states=outputs.hidden_states,
1417
+ attentions=outputs.attentions,
1418
+ )
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 1024,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e802fe5337779428818439760a1e6161ed36ceed72d4ebcbda9c139a2108fc99
3
+ size 17082988
tokenizer_config.json ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "max_length": 1024,
50
+ "model_max_length": 1024,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "<pad>",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "</s>",
56
+ "stride": 0,
57
+ "tokenizer_class": "XLMRobertaTokenizer",
58
+ "truncation_side": "right",
59
+ "truncation_strategy": "longest_first",
60
+ "unk_token": "<unk>"
61
+ }