Data preparation and fine-tuning

#2
by snehilsanyal - opened

Hey, @JacksonLark can you provide some information on data preparation for the OIG datasets? Also, how did you split the dataset for fine-tuning? Any supporting resources would be helpful.

Lark AI org

Hey, @JacksonLark can you provide some information on data preparation for the OIG datasets? Also, how did you split the dataset for fine-tuning? Any supporting resources would be helpful.

  1. data preparation depends on training code. My training data like:
{
    "instruction":"Given the following schema:\nroad (road_name, state_name)\nstate (state_name, capital, population, area, country_name, density)\nhighlow (state_name, highest_point, highest_elevation, lowest_point, lowest_elevation)\nlake (lake_name, area, state_name, country_name)\nriver (river_name, length, traverse, country_name)\nborder_info (state_name, border)\nmountain (mountain_name, mountain_altitude, state_name, country_name)\ncity (city_name, state_name, population, country_name)\nWrite a SQL query to what states does the mississippi river run through",
    "input":"",
    "output":"SELECT traverse FROM river WHERE river_name = \"mississippi\" ;"
}
  1. fine tuning, simple you can use hf example code: https://github.com/huggingface/transformers/blob/main/examples/pytorch/summarization/README.md

Do you have a lead now @snehilsanyal , as I am new to these stuffs, any help from you would be grateful.

@NikAlan sorry, I have been a bit busy in other works, will start again on fine-tuning, will let you know.

Sign up or log in to comment