davanstrien's picture
davanstrien HF staff
Update login.html to fix heading tag and update hyperlinks
8fd5383
raw
history blame contribute delete
No virus
5.67 kB
{% extends "page.html" %} {% block stylesheet %}
<style>
.left-align {
text-align: left;
}
.center-align {
text-align: center;
}
.container {
margin: 20px auto;
max-width: 800px;
}
h2,
h3,
h4 {
margin-top: 20px;
}
p {
line-height: 1.6;
}
ul {
margin-left: 20px;
}
</style>
{% endblock %} {% block site %}
<div id="jupyter-main-app" class="container">
<div class="center-align">
<img
src="https://huggingface.co/datasets/davanstrien/assets/resolve/main/logo.jpg"
alt="Space Logo"
style="width: 80%"
/>
<p>
This Space is designed to provide you with an easy way to get started
generating synthetic datasets using Spaces compute to host open LLMs. The
Space comes with a ready-to-go environment and a series of notebooks
showing various examples of generating synthetic datasets.
You can read more about the aims of the Space in this <a href="https://huggingface.co/blog/davanstrien/synthetic-data-workshop" target="_blank">blog post</a>.
</p>
</p>
</div>
<div class="left-align">
<h2>What's covered?</h2>
<p>Currently this Space has notebooks covering the following topics:</p>
<h3>Creating synthetic text similarity datasets</h3>
<p>
A set of notebooks covering the steps for creating a synthetic dataset for
fine-tuning a sentence similarity model. These notebooks cover:
</p>
<ul>
<li>
How to do structured generation using the
<a href="https://github.com/outlines-dev/outlines" target="_blank">outlines</a> library
to have more control on the outputs generated by a LLM.
</li>
<li>
How to use
<a href="https://docs.llamaindex.ai/en/stable/" target="_blank">Llama-index</a> to chunk
texts to fit into the context length of sentence embedding models.
</li>
<li>
Using <a href="https://github.com/vllm-project/vllm" target="_blank">vLLM</a> to
efficiently create a dataset that can be used to fine-tune a Sentence
similarity model.
</li>
</ul>
</div>
<div class="center-align">
<h2>Using the Space</h2>
<p>
To use this Space, you should <a href="https://huggingface.co/spaces/davanstrien/synthetic-data-workshop?duplicate=true" target="_blank">duplicate it</a>.
To ensure your work is saved it's suggested to enable persistent storage for your Space.
To start, you may want to use a smaller GPU like the T4 and switch out to a bigger GPU when you want to run larger LLMs or generate more data.
<b>Reminder</b> you can preview the notebooks in the Space without running
them. You can find the Jupyter Notebooks in the <a href="https://huggingface.co/spaces/davanstrien/synthetic-data-workshop/tree/main/notebooks" target="_blank">notebooks folder </a>.
</p>
</p>
<h2>Duplicate the Space to run your own instance</h2>
<br />
<a
class="duplicate-button"
style="display: inline-block"
target="_blank"
href="https://huggingface.co/spaces/davanstrien/synthetic-data-workshop?duplicate=true"
>
<img
style="margin: 0"
src="https://img.shields.io/badge/-Duplicate%20Space-blue?labelColor=white&amp;style=flat&amp;logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAAXNSR0IArs4c6QAAAP5JREFUOE+lk7FqAkEURY+ltunEgFXS2sZGIbXfEPdLlnxJyDdYB62sbbUKpLbVNhyYFzbrrA74YJlh9r079973psed0cvUD4A+4HoCjsA85X0Dfn/RBLBgBDxnQPfAEJgBY+A9gALA4tcbamSzS4xq4FOQAJgCDwV2CPKV8tZAJcAjMMkUe1vX+U+SMhfAJEHasQIWmXNN3abzDwHUrgcRGmYcgKe0bxrblHEB4E/pndMazNpSZGcsZdBlYJcEL9Afo75molJyM2FxmPgmgPqlWNLGfwZGG6UiyEvLzHYDmoPkDDiNm9JR9uboiONcBXrpY1qmgs21x1QwyZcpvxt9NS09PlsPAAAAAElFTkSuQmCC&amp;logoWidth=14"
alt="Duplicate Space"
/>
</a>
<br />
<br />
<h4>The default token is <span style="color: orange">huggingface</span></h4>
</div>
{% if login_available %}
<div class="center-align">
<form
action="{{base_url}}login?next={{next}}"
method="post"
class="form-inline"
>
{{ xsrf_form_html() | safe }} {% if token_available %}
<label for="password_input"
><strong>{% trans %}Token:{% endtrans %}</strong></label
>
{% else %}
<label for="password_input"
><strong>{% trans %}Password:{% endtrans %}</strong></label
>
{% endif %}
<input
type="password"
name="password"
id="password_input"
class="form-control"
/>
<button type="submit" class="btn btn-default" id="login_submit">
{% trans %}Log in{% endtrans %}
</button>
</form>
</div>
{% else %}
<div class="center-align">
<p>
{% trans %}No login available, you shouldn't be seeing this page.{%
endtrans %}
</p>
</div>
{% endif %}
<div class="center-align" style="font-size: 0.8em; color: #888">
<p>
This template was created by
<a href="https://twitter.com/camenduru" target="_blank">camenduru</a> and
<a href="https://huggingface.co/nateraw" target="_blank">nateraw</a>, with
contributions of
<a href="https://huggingface.co/osanseviero" target="_blank"
>osanseviero</a
>
and <a href="https://huggingface.co/azzr" target="_blank">azzr</a>
</p>
</div>
{% if message %}
<div class="row">
{% for key in message %}
<div class="message {{key}}">{{message[key]}}</div>
{% endfor %}
</div>
{% endif %} {% if token_available %} {% block token_message %} {% endblock
token_message %} {% endif %}
</div>
{% endblock %} {% block script %} {% endblock %}