load_model()
Loads the GPT-2 model and tokenizer from the specified directory paths.
lifespan()
Manages the application lifecycle. Initializes the model at startup and handles cleanup on shutdown.
classify_text_sync()
Synchronously tokenizes input text and predicts using the GPT-2 model. Returns classification and perplexity.
classify_text()
Asynchronously runs classify_text_sync()
in a thread pool for non-blocking text classification.
analyze_text()
POST endpoint: Accepts text input, classifies it using classify_text()
, and returns the result with perplexity.
health()
GET endpoint: Simple health check for API liveness.
parse_docx()
, parse_pdf()
, parse_txt()
Utilities to extract and convert .docx
, .pdf
, and .txt
file contents to plain text.
warmup()
Downloads the model repository and initializes the model/tokenizer using load_model()
.
download_model_repo()
Downloads the model files from the designated MODEL
folder.
get_model_tokenizer()
Checks if the model already exists; if not, downloads it—otherwise, loads the cached model.
handle_file_upload()
Handles file uploads from the /upload
route. Extracts text, classifies, and returns results.
extract_file_contents()
Extracts and returns plain text from uploaded files (PDF, DOCX, TXT).
handle_file_sentence()
Processes file uploads by analyzing each sentence (under 10,000 chars) before classification.
handle_sentence_level_analysis()
Checks/strips each sentence, then computes AI/human likelihood for each.
analyze_sentences()
Splits paragraphs into sentences, classifies each, and returns all results.
analyze_sentence_file()
Like handle_file_sentence()
—analyzes sentences in uploaded files.