Commits · polygraf-ai/article

add semantic scholar

b51be98

minko186 commited on Sep 30, 2024

initial commit

9a9aac4

minko186 commited on Sep 27, 2024

chore: increase of numbers to scrape; disabled PDF check in scholar model

a6fbfb6

eljanmahammadli commited on Sep 26, 2024

added pagintion to google search, now retrieving more sites

5650543

eljanmahammadli commited on Sep 26, 2024

#feat added simplest scholar mode

fa3e7dd

eljanmahammadli commited on Sep 24, 2024

#feat added reference section to the end

8c8c07f

eljanmahammadli commited on Sep 24, 2024

#feat: added YouTube as RAG input; removed standard humanizer

744d9e3

eljanmahammadli commited on Sep 24, 2024

#perf: quality improvements to website scrape + PDF detect logic

d904dd4

eljanmahammadli commited on Sep 23, 2024

#perf added hybrid search using bm25 + semantic, minor change to text, splitter, and retrieval hyperparameters

8b9c9ff

eljanmahammadli commited on Sep 23, 2024

#fix added headers that bypasses response code 418

bf1e0a0

eljanmahammadli commited on Sep 20, 2024

#bugfix response 301 is solved, as we should explicitly set follow_redirects for httpx

6f4a113

eljanmahammadli commited on Sep 20, 2024

Update requirements.txt

c359315
verified

aliasgerovs commited on Sep 18, 2024

fix citation id ordering

3f69766

eljanmahammadli commited on Sep 12, 2024

update

2600e48

eljanmahammadli commited on Sep 9, 2024

Merge branch 'staging'

b26a983

eljanmahammadli commited on Sep 7, 2024

decreased num max docs before retrieval

80a07a7

eljanmahammadli commited on Sep 7, 2024

inline citations and more

593bb22

eljanmahammadli commited on Sep 6, 2024

fixes to inline citation, html; added debug true, chroma db clean cache

ba91632

eljanmahammadli commited on Sep 6, 2024

update popups to be more centered

7c04c25

minko186 commited on Sep 5, 2024

add popup html + report humanizer section

eeb907d

minko186 commited on Sep 5, 2024

update gitignore

88a1d09

eljanmahammadli commited on Sep 5, 2024

fix double space on generated text + changed humanizer to batched

24a0ba5

minko186 commited on Sep 4, 2024

cleaned up output format + switch all records of text to new format

c1769c1

minko186 commited on Aug 30, 2024

updates on prompt + better error handling

f6b1cb0

minko186 commited on Aug 29, 2024

add inline citations + page content

e76dfe8

minko186 commited on Aug 28, 2024

Added 5 times bigger embedding model all-mpnet-base-v2

95168db

eljanmahammadli commited on Aug 24, 2024

added topic name to save to GCS

cfad98b

eljanmahammadli commited on Aug 24, 2024

remove content_string (not used) + clean unicode non-printable chars + add pymupdf reading for pdf urls

a62cc34

minko186 commited on Aug 23, 2024

added new decoder only LM as a humanizer + UI suport

e2a79fa

eljanmahammadli commited on Aug 23, 2024

added gcp command

db77dd7

eljanmahammadli commited on Aug 19, 2024

integrated save ai generated text to the bucket

c412123

eljanmahammadli commited on Aug 19, 2024

decreased num workers

f801525

eljanmahammadli commited on Aug 19, 2024

Added MC model to UI and removed some unnecessary code

5534eb0

eljanmahammadli commited on Aug 19, 2024

adding mail as format type and plain text to structure

d09cdf3

eljanmahammadli commited on Aug 19, 2024

added yellow to highlighter + adjusted thresholds

7c7ccca

minko186 commited on Aug 14, 2024

fixed shared history + add clear history button

b96ba8b

minko186 commited on Aug 14, 2024

enable ai model selection and api key

bf91121

eljanmahammadli commited on Aug 14, 2024

removed unused imports

8f26ea6

eljanmahammadli commited on Aug 14, 2024

history auto refresh + rag search term includes topic and context

2a53cb7

minko186 commited on Aug 13, 2024

added history for generated text

439d01d

eljanmahammadli commited on Aug 13, 2024

changed split logic to resolve short generated text, more search website and some logging

59fbf6a

eljanmahammadli commited on Aug 13, 2024

adding queue for parallel request

8bd7fd1

eljanmahammadli commited on Aug 13, 2024

format updates + search added to RAG instead

708f094

minko186 commited on Aug 13, 2024

fix reference new line

da88846

eljanmahammadli commited on Aug 9, 2024

merge main + multi pdfs + updated html cleaning + better references

43d4e83

minko186 commited on Aug 7, 2024

moved google search api to env file

48d4d11

eljanmahammadli commited on Aug 7, 2024

merged adapter and base model for XL

924cb86

eljanmahammadli commited on Aug 7, 2024

new XL Model

7f46ae3

eljanmahammadli commited on Aug 7, 2024

fixed url_content var error

10aedaa

minko186 commited on Aug 7, 2024

sync with humanize.py from main

078999d

minko186 commited on Aug 7, 2024

Commit History

add semantic scholar b51be98

initial commit 9a9aac4

chore: increase of numbers to scrape; disabled PDF check in scholar model a6fbfb6

added pagintion to google search, now retrieving more sites 5650543

#feat added simplest scholar mode fa3e7dd

#feat added reference section to the end 8c8c07f

#feat: added YouTube as RAG input; removed standard humanizer 744d9e3

#perf: quality improvements to website scrape + PDF detect logic d904dd4

#perf added hybrid search using bm25 + semantic, minor change to text, splitter, and retrieval hyperparameters 8b9c9ff

#fix added headers that bypasses response code 418 bf1e0a0

#bugfix response 301 is solved, as we should explicitly set follow_redirects for httpx 6f4a113

Update requirements.txt c359315 verified

fix citation id ordering 3f69766

update 2600e48

Merge branch 'staging' b26a983

decreased num max docs before retrieval 80a07a7

inline citations and more 593bb22

fixes to inline citation, html; added debug true, chroma db clean cache ba91632

update popups to be more centered 7c04c25

add popup html + report humanizer section eeb907d

update gitignore 88a1d09

fix double space on generated text + changed humanizer to batched 24a0ba5

cleaned up output format + switch all records of text to new format c1769c1

updates on prompt + better error handling f6b1cb0

add inline citations + page content e76dfe8

Added 5 times bigger embedding model all-mpnet-base-v2 95168db

added topic name to save to GCS cfad98b

remove content_string (not used) + clean unicode non-printable chars + add pymupdf reading for pdf urls a62cc34

added new decoder only LM as a humanizer + UI suport e2a79fa

added gcp command db77dd7

integrated save ai generated text to the bucket c412123

decreased num workers f801525

Added MC model to UI and removed some unnecessary code 5534eb0

adding mail as format type and plain text to structure d09cdf3

added yellow to highlighter + adjusted thresholds 7c7ccca

fixed shared history + add clear history button b96ba8b

enable ai model selection and api key bf91121

removed unused imports 8f26ea6

history auto refresh + rag search term includes topic and context 2a53cb7

added history for generated text 439d01d

changed split logic to resolve short generated text, more search website and some logging 59fbf6a

adding queue for parallel request 8bd7fd1

format updates + search added to RAG instead 708f094

fix reference new line da88846

merge main + multi pdfs + updated html cleaning + better references 43d4e83

moved google search api to env file 48d4d11

merged adapter and base model for XL 924cb86

new XL Model 7f46ae3

fixed url_content var error 10aedaa

sync with humanize.py from main 078999d

add semantic scholar

b51be98

initial commit

9a9aac4

chore: increase of numbers to scrape; disabled PDF check in scholar model

a6fbfb6

added pagintion to google search, now retrieving more sites

5650543

#feat added simplest scholar mode

fa3e7dd

#feat added reference section to the end

8c8c07f

#feat: added YouTube as RAG input; removed standard humanizer

744d9e3

#perf: quality improvements to website scrape + PDF detect logic

d904dd4

#perf added hybrid search using bm25 + semantic, minor change to text, splitter, and retrieval hyperparameters

8b9c9ff

#fix added headers that bypasses response code 418

bf1e0a0

#bugfix response 301 is solved, as we should explicitly set follow_redirects for httpx

6f4a113

Update requirements.txt

c359315
verified

fix citation id ordering

3f69766

update

2600e48

Merge branch 'staging'

b26a983

decreased num max docs before retrieval

80a07a7

inline citations and more

593bb22

fixes to inline citation, html; added debug true, chroma db clean cache

ba91632

update popups to be more centered

7c04c25

add popup html + report humanizer section

eeb907d

update gitignore

88a1d09

fix double space on generated text + changed humanizer to batched

24a0ba5

cleaned up output format + switch all records of text to new format

c1769c1

updates on prompt + better error handling

f6b1cb0

add inline citations + page content

e76dfe8

Added 5 times bigger embedding model all-mpnet-base-v2

95168db

added topic name to save to GCS

cfad98b

remove content_string (not used) + clean unicode non-printable chars + add pymupdf reading for pdf urls

a62cc34

added new decoder only LM as a humanizer + UI suport

e2a79fa

added gcp command

db77dd7

integrated save ai generated text to the bucket

c412123

decreased num workers

f801525

Added MC model to UI and removed some unnecessary code

5534eb0

adding mail as format type and plain text to structure

d09cdf3

added yellow to highlighter + adjusted thresholds

7c7ccca

fixed shared history + add clear history button

b96ba8b

enable ai model selection and api key

bf91121

removed unused imports

8f26ea6

history auto refresh + rag search term includes topic and context

2a53cb7

added history for generated text

439d01d

changed split logic to resolve short generated text, more search website and some logging

59fbf6a

adding queue for parallel request

8bd7fd1

format updates + search added to RAG instead

708f094

fix reference new line

da88846

merge main + multi pdfs + updated html cleaning + better references

43d4e83

moved google search api to env file

48d4d11

merged adapter and base model for XL

924cb86

new XL Model

7f46ae3

fixed url_content var error

10aedaa

sync with humanize.py from main

078999d