{% extends "page.html" %} {% block stylesheet %} {% endblock %} {% block site %}
Space Logo

This Space is designed to provide you with an easy way to get started generating synthetic datasets using Spaces compute to host open LLMs. The Space comes with a ready-to-go environment and a series of notebooks showing various examples of generating synthetic datasets. You can read more about the aims of the Space in this blog post.

What's covered?

Currently this Space has notebooks covering the following topics:

Creating synthetic text similarity datasets

A set of notebooks covering the steps for creating a synthetic dataset for fine-tuning a sentence similarity model. These notebooks cover:

Using the Space

To use this Space, you should duplicate it. To ensure your work is saved it's suggested to enable persistent storage for your Space. To start, you may want to use a smaller GPU like the T4 and switch out to a bigger GPU when you want to run larger LLMs or generate more data. Reminder you can preview the notebooks in the Space without running them. You can find the Jupyter Notebooks in the notebooks folder .

Duplicate the Space to run your own instance


Duplicate Space

The default token is huggingface

{% if login_available %}
{{ xsrf_form_html() | safe }} {% if token_available %} {% else %} {% endif %}
{% else %}

{% trans %}No login available, you shouldn't be seeing this page.{% endtrans %}

{% endif %}

This template was created by camenduru and nateraw, with contributions of osanseviero and azzr

{% if message %}
{% for key in message %}
{{message[key]}}
{% endfor %}
{% endif %} {% if token_available %} {% block token_message %} {% endblock token_message %} {% endif %}
{% endblock %} {% block script %} {% endblock %}