Website links that automatically improve its accuracy by using assistance from a Large Language Model (LLM)
Linking within a website enhances navigation and user experience by connecting related content. This connectivity ensures users can easily explore various sections without getting lost, ultimately improving engagement and satisfaction.
Every webmaster should link various terms from one page to related pages within a website, as well as linking back. This improves rankings and makes it easier for users to find relevant information on your website.
More importantly, internal links help drive sales of products or services by guiding visitors through different sections of your website. For an informative site in particular, proper use of internal links can greatly enhance the user experience by helping them better understand the content they are reading.
Now imagine the painstaking process of creating a new page where I have to meticulously search through all those countless entries just to link them appropriately? It's not exactly rocket science but rather an exercise in extreme patience – or so it seems until you realize how much time is wasted on this trivial chore.
I mean, who knew managing an online presence could be such masochistic and hard monotonous routine work?
A single webpage can contain anywhere from ten to thirty crucial terms that link both internally within the site and externally.
Locating pertinent information is an incredibly time-consuming task!
Workflow: Semantic Search with Large Language Model Embeddings
This workflow outlines a process for implementing semantic search using large language model (LLM) embeddings. The goal is to create intelligent links on web pages that automatically connect users to the most relevant content based on context.
Steps:
Data Storage Setup:
- Utilize PostgreSQL with its vector data type to store and manage LLM-generated text embeddings efficiently.
Text Chunking:
- Divide all website page texts into manageable chunks or segments for processing.
Embedding Generation:
- Generate embeddings (vector representations) for each chunk of text using a large language model.
Template Markup Creation:
- Develop template markup to identify specific terms within the pages that require semantic links.
Markup Interpolation and Linking Program:
- Implement an automated program to process these marked-up terms.
- Convert each marked term into its corresponding LLM embedding.
- Compare this embedding with all stored embeddings in the database using a similarity metric (e.g., cosine similarity).
- Identify the most relevant page based on highest similarity score.
- Incorporate an automatic link to this most relevant page within the original content.
- Implement an automated program to process these marked-up terms.
Benefits:
- Enhanced User Experience: Users are directed to highly contextually related information, improving navigation and understanding of complex topics.
- Efficiency: Automates the process of creating semantic links, reducing manual effort while maintaining accuracy.
I won't cover the details of setting up data storage, text chunking, or generating embeddings in this explanation. However, if you're interested in learning more about these topics or need assistance specifically with PostgreSQL setup, feel free to reach out at any time—I'll be happy to help!
Template Markup Creation
My RCD Template Interpolation System package for Emacs aids in using interpolation functions and allows me to customize the markup as well.
By standard I am using following markup to interpolate template tags
⟦ (lisp-command "Something") ⟧
.
As a primarily Emacs Lisp user, I want to point out that similar templating and string interpolation features exist across various programming languages—such as Python or whatever else you might be using. If you're curious about how it works in another specific language, feel free to ask an LLM for guidance!
I like these special delimiters as they are not often used in markup and text:
- For opening the delimiter
⟦
- MATHEMATICAL RIGHT WHITE SQUARE BRACKET - For closing the delimiter
⟦
- MATHEMATICAL LEFT WHITE SQUARE BRACKET
I'm looking for a way to add interpolation for both variables and Lisp code within various types of markup documents like Markdown, Org mode, or Asciidoctor. Typically, this involves first using interpolation followed by one of these markups.
However, in my case, I don't need any variable or code interpolation; all I want is simple text formatting with markup. Therefore, I plan to use the following delimiters:
(defcustom rcd-template-any-delimiter-open "〈"
"The opening delimiter for RCD Template Interpolation System."
:group 'rcd
:type 'string)
(defcustom rcd-template-any-delimiter-close "〉"
"The closing delimiter for RCD Template Interpolation System."
:group 'rcd
:type 'string)
And now how is that practically used?
Imagine that I am writing this (precisely this) paragraph about 〈Emacs Lisp〉. To highlight a section of text—selecting it for conversion into a “universal semantic link”—I navigate to the words "Emacs Lisp," mark them, and then use a function, perhaps triggered by key bindings, to enclose those selected terms with specific delimiters.
You can see there how I did it: 〈Emacs Lisp〉
. Spaces before or
after the string should not matter.
Words or terms enclosed by delimiters will be transformed into embeddings. These embeddings will then be compared with those stored in the database that correspond either to the same section of a website, a specific set of pages, or external links.
My personal function is following:
(defun wrs-area-pages-by-embeddings (&optional limit link)
"Search for pages in the current Hyperscope or selected website area
based on embeddings similarity to a given query. Optionally LIMIT the
number of results.
This function performs an embedding-based search within a specified
area, using either the currently active table's associated hyperscope or
allowing selection from available areas if none is set. It retrieves IDs
of objects with similar embeddings and displays them in Hyperscope mode.
- LIMIT: Optional argument to restrict the number of returned items (not
used directly by this function but can be passed through).
The query for similarity search is derived either from selected text or
user input, then converted into an embedding using
`rcd-llm-get-embedding-single'. The SQL subquery calculates similarities
between embeddings and groups them accordingly. Results are ordered by
the minimum similarity within each group.
If any matching IDs `id-list' are found, they will be opened in
Hyperscope mode with a formatted query description as context."
(interactive)
(let* ((query (or (rcd-region-string) (rcd-ask-get "Query: ")))
(query (rcd-llm-get-embedding-single query))
(limit (or limit 10))
(area (cond ((and rcd-db-current-table-id (hyperscope-area rcd-db-current-table-id))
(hyperscope-area rcd-db-current-table-id))
(t (wrs-areas-select))))
(id-list (rcd-sql-list
"SELECT subquery.embeddings_referencedid
FROM (
SELECT e.embeddings_referencedid,
e.embeddings_embeddings <=> $1 AS similarity
FROM embeddings e
JOIN hyobjects h ON e.embeddings_referencedid = h.hyobjects_id
WHERE e.embeddings_embeddingtypes = 1
AND e.embeddings_embeddings <=> $1 < 0.5
AND h.hyobjects_areas = $2
ORDER BY similarity ASC
LIMIT $3
) AS subquery
GROUP BY subquery.embeddings_referencedid
ORDER BY MIN(subquery.similarity) ASC"
rcd-db query area limit)))
(cond ((and link id-list) (wrs-insert-lightweight-markup-hyperlink (car id-list)))
(id-list (hyperscope-by-id-list id-list (format "Query: %s" query)))
(t (rcd-message "Matches not found")))))
and then higher level function to run it on template interpolation:
(defun wrs-semantic-link (query)
"Generate a semantic link using embeddings.
This function creates a semantic link by leveraging the
`wrs-area-pages-by-embeddings' function. It does not require any
arguments and is intended to be used interactively. When called, it
processes text in region based on their embeddings and inserts
appropriate links."
(let* ((id (wrs-area-pages-by-embeddings query nil t))
(link (hyperscope-url-link id)))
(cond ((and link (rcd-string-not-empty-p link)) (format "[%s](%s)" query link))
(t query))))
(defun wrs-semantic-link-region ()
"Interactively replaces the current selected text with a semantic link
using `wrs-semantic-link' function. If no region is active, does
nothing.
The function uses `rcd-region-string' to get the currently selected text
and stores it in variable `region'. It then checks if there is any
content in this variable; if not, execution stops here without further
action.
Finally, replaces the original selected text with content from
`interpolated' utilizing helper function `rcd-region-string-replace'."
(interactive)
(let ((region (rcd-region-string)))
(when region
(let ((interpolated (rcd-template-eval-any region #'wrs-semantic-link)))
(rcd-region-string-replace interpolated)))))
But now how to use the function?
The terms in text must be marked, like here: 〈Emacs Lisp〉 is a dialect of the LISP programming language used by the 〈GNU Emacs〉 text editor for extending and customizing its functionality.
The text is interpolated by
rcd-template-eval-any
function from RCD Template Interpolation System for Emacs and how exactly programmer is to do that depends on text processing.Any marked terms receive their links in the appropriate markup, whatever it is.
It's important to note that, beyond highlighting specific terms in the text, there is minimal further human involvement required. The system has the capability of automatically linking these highlighted terms appropriately without additional input from a person.
Let me make an Emacs Lisp region function that demonstrates how it works...
Sample text
Emacs input methods provide a way to input characters from various writing systems and languages.
**...then markup...*
〈Emacs input methods〉 provide a way to input characters from various writing systems and languages.
...followed by semantic linking...
[Emacs input methods](https://gnu.support/gnu-emacs/emacs-lisp/Emacs-Lisp-input-method-for- FULLWIDTH-LATIN-LETTERS.html) provide a way to input characters from various writing systems and languages.
Which can be shown here as source:
3. [Emacs input
methods](https://gnu.support/gnu-emacs/emacs-lisp/Emacs-Lisp-input-method-for-
FULLWIDTH-LATIN-LETTERS.html) provide a way to
input characters from various writing systems and languages.
Let us say, we are speaking of 〈Large Language Models〉:
Let us say, we are speaking of [Large Language Models](https://gnu.support/large-language-models-llm/index.html):
IMPORTANT TO NOTE is that me, who is author, doesn't look up links any more, I leave it for the system to decide.
My work is to mark the terms which I want them to be hyperlinked by using semantic search.
How do these semantically founded website links automatically enhance their accuracy?
The more website pages you write, the more semantic content you add, which improves matching accuracy. With each page update, there may be changes in semantic linking that can be verified and optimized through your own algorithms for greater precision.
Automated link building is a powerful tool for webmasters; it requires human oversight to mark up relevant text but significantly reduces tedious tasks like link searching and manual updates.
Furthermore, if any URLs change, they are automatically updated across all website pages. This article aims to provide developers with concepts that can be implemented in their own projects.
Reference
GNU Project: https://www.gnu.org
GNU Emacs - GNU Project: https://www.gnu.org/software/emacs/
What is Free Software? - GNU Project - Free Software Foundation: https://www.gnu.org/philosophy/free-sw.html