Spaces:
Sleeping
Sleeping
File size: 4,128 Bytes
c1c33a0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
{% extends "page.html" %} {% block stylesheet %}
<style>
.left-align {
text-align: left;
}
.center-align {
text-align: center;
}
.container {
margin: 20px auto;
max-width: 800px;
}
h2,
h3,
h4 {
margin-top: 20px;
}
p {
line-height: 1.6;
}
ul {
margin-left: 20px;
}
</style>
{% endblock %} {% block site %}
<div id="jupyter-main-app" class="container">
<div class="center-align">
<img
src="https://huggingface.co/datasets/davanstrien/assets/resolve/main/logo.jpg"
alt="Space Logo"
style="width: 75%"
/>
<p>
This Space is designed to provide you with an easy way to get started
generating synthetic datasets using Spaces compute to host open LLMs. The
Space comes with a ready-to-go environment and a series of notebooks
showing various examples of generating synthetic datasets.
</p>
</div>
<div class="left-align">
<h2>What's covered?</h2>
<p>Currently this Space has notebooks covering the following topics:</p>
<h3>Creating synthetic text similarity datasets</h3>
<p>
A set of notebooks covering the steps for creating a synthetic dataset for
fine-tuning a sentence similarity model. These notebooks cover:
</p>
<ul>
<li>
How to do structured generation using the
<a href="https://github.com/outlines-dev/outlines">outlines</a> library
to have more control on the outputs generated by a LLM.
</li>
<li>
How to use
<a href="https://docs.llamaindex.ai/en/stable/">Llama-index</a> to chunk
texts to fit into the context length of sentence embedding models
</li>
<li>
Using <a href="https://github.com/vllm-project/vllm">vLLM</a> to
efficiently create a dataset that can be used to fine tune a Sentence
similarity model
</li>
</ul>
</div>
<div class="center-align">
<h2>Using the Space</h2>
<p>
To use this Space, use the duplicate button. You'll want to enable
persistent storage so you can save your work. To start, you may want to use a smaller GPU like the T4 and switch out to a bigger GPU when you want to use bigger models for generating data.
</p>
<h2>Duplicate the Space to run your own instance</h4>
<h4>The default token is <span style="color: orange">huggingface</span></h4>
</div>
{% if login_available %}
<div class="center-align">
<form
action="{{base_url}}login?next={{next}}"
method="post"
class="form-inline"
>
{{ xsrf_form_html() | safe }} {% if token_available %}
<label for="password_input"
><strong>{% trans %}Token:{% endtrans %}</strong></label
>
{% else %}
<label for="password_input"
><strong>{% trans %}Password:{% endtrans %}</strong></label
>
{% endif %}
<input
type="password"
name="password"
id="password_input"
class="form-control"
/>
<button type="submit" class="btn btn-default" id="login_submit">
{% trans %}Log in{% endtrans %}
</button>
</form>
</div>
{% else %}
<div class="center-align">
<p>
{% trans %}No login available, you shouldn't be seeing this page.{%
endtrans %}
</p>
</div>
{% endif %}
<div class="center-align" style="font-size: 0.8em; color: #888">
<p>
This template was created by
<a href="https://twitter.com/camenduru" target="_blank">camenduru</a> and
<a href="https://huggingface.co/nateraw" target="_blank">nateraw</a>, with
contributions of
<a href="https://huggingface.co/osanseviero" target="_blank"
>osanseviero</a
>
and <a href="https://huggingface.co/azzr" target="_blank">azzr</a>
</p>
</div>
{% if message %}
<div class="row">
{% for key in message %}
<div class="message {{key}}">{{message[key]}}</div>
{% endfor %}
</div>
{% endif %} {% if token_available %} {% block token_message %} {% endblock
token_message %} {% endif %}
</div>
{% endblock %} {% block script %} {% endblock %}
|