betterdataai/large-tabular-model

Developed by: betterdataai
License: apache-2.0
Finetuned from model : unsloth/Llama-3.2-3B-Instruct

Prerequisite

Following packages are needed to do the inference

unsloth 
transformers
pandas
datasets
trl
torch
accelerate
scipy

Model Demonstration

This is a large tabular model that can generate tabular data according to the user's data column description.

The example prompt looks like this:

instruction = """
You are tasked with generating a synthetic dataset based on the following description. The dataset represents network traffic information. The dataset should include the following columns:

- IPV4_SRC_ADDR (String): IPv4 source address, following the standard format (e.g., ""59.166.0.6"", ""149.171.126.0"",""175.45.176.2"").
- L4_SRC_PORT (Integer): IPv4 source port number, a value between 1024 and 65535 (e.g., 443).
- IPV4_DST_ADDR (String): IPv4 destination address, following the standard format (e.g., ""149.171.126.6"").
- L4_DST_PORT (Integer): IPv4 destination port number, a value between 1024 and 65535 (e.g., 80).
- PROTOCOL (Integer): IP protocol identifier byte, representing the protocol used (e.g., 6 for TCP or 17 for UDP).
- L7_PROTO (Integer): Layer 7 protocol (numeric), indicating the application protocol, ranging from 0 to 249 (e.g., 1 for HTTP, 2 for HTTPS).
- IN_BYTES (Integer): Incoming number of bytes, representing the data transferred into the network, ranging from 0 to 10,000,000 (e.g., 1500).
- OUT_BYTES (Integer): Outgoing number of bytes, representing the data transferred out of the network, ranging from 0 to 10,000,000 (e.g., 2000).
- IN_PKTS (Integer): Incoming number of packets, representing the count of packets entering the network (e.g., 120).
- OUT_PKTS (Integer): Outgoing number of packets, representing the count of packets leaving the network (e.g., 110).
- TCP_FLAGS (Integer): Cumulative of all TCP flags (e.g., 27, 0, 19, 18 ).
- FLOW_DURATION_MILLISECONDS (Integer): Flow duration in milliseconds, indicating how long the flow lasted (e.g., 15000).
- Label (Integer): Label for indicating malicious attack or not (e.g., 0 for benign traffic or 1 for attack)
"""

With following code, we can generate tabular data:

from unsloth import FastLanguageModel
from transformers import TextStreamer

max_seq_length = 2048 
dtype = None 
load_in_4bit = False 

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "betterdataai/large-tabular-model", 
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model)

messages = [{"role": "system", "content": instruction},
    {"role": "user", "content": "Create 20 rows data}}"}]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")


text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 2048,
                   use_cache = True, temperature = 1.5, min_p = 0.1)

The output looks like this:

IPV4_SRC_ADDR,L4_SRC_PORT,IPV4_DST_ADDR,L4_DST_PORT,PROTOCOL,L7_PROTO,IN_BYTES,OUT_BYTES,IN_PKTS,OUT_PKTS,TCP_FLAGS,FLOW_DURATION_MILLISECONDS,Label
175.45.176.3,65502,149.171.126.11,80,6,7.0,800,1338,10,10,27,1429,0
59.166.0.2,51487,149.171.126.3,80,6,7.0,1580,10168,12,18,27,0,0
59.166.0.0,13943,149.171.126.0,11862,6,36.0,2334,16822,36,38,27,9,0
59.166.0.7,40294,149.171.126.7,21,6,1.0,2934,3740,52,54,27,844,0
59.166.0.9,63416,149.171.126.5,21,6,1.0,2934,3742,52,54,27,0,0
175.45.176.2,0,149.171.126.17,0,45,0.0,200,0,2,0,0,0,1
175.45.176.3,64403,149.171.126.14,179,6,13.0,472,336,10,8,19,538,0
59.166.0.8,39142,149.171.126.3,53,17,5.0,130,162,2,2,0,1,0
59.166.0.3,60342,149.171.126.4,25,6,3.0,37868,3380,54,42,27,35,0
59.166.0.3,40433,149.171.126.5,5190,6,0.0,2158,2464,24,24,27,6,0
59.166.0.0,21116,149.171.126.5,53,17,5.0,130,162,2,2,0,0,0
175.45.176.1,0,149.171.126.17,0,23,0.0,200,0,2,0,0,0,1
59.166.0.5,27940,149.171.126.2,21,6,1.0,2934,3738,52,54,27,4294952,0
59.166.0.2,14905,149.171.126.1,22,6,92.0,3728,5474,32,24,27,0,0
175.45.176.1,0,149.171.126.10,0,33,0.0,200,0,2,0,0,0,1
59.166.0.3,37986,149.171.126.0,5190,6,0.0,1470,1728,22,14,27,4,0
59.166.0.1,49949,149.171.126.7,80,6,7.0,1580,10168,12,18,27,4294952,0
59.166.0.2,51911,149.171.126.6,53,17,0.0,146,178,2,2,0,0,0
59.166.0.1,17727,149.171.126.9,5190,6,0.0,2158,2464,24,24,27,7,0
59.166.0.3,56144,149.171.126.0,5190,6,0.0,1470,1728,22,14,27,0,0<|eot_id|>

betterdataai
/

large-tabular-model

Prerequisite

Model Demonstration

Model tree for betterdataai/large-tabular-model