washeed
/

ocr

Model card Files Files and versions Community

washeed commited on May 1, 2024

Commit

b692870

verified ·

1 Parent(s): 238ccc2

Upload 18 files

Browse files

Files changed (18) hide show

Documentation.ipynb +117 -0
abc_1.py +114 -0
augmentA.py +26 -0
best/BEST.pth +3 -0
docxtoimage.py +45 -0
dump.py +14 -0
english_g2/english_g2.pth +3 -0
inputPDFToOutputOCR.py +86 -0
ocr.py +44 -0
output/AI/ai.pdf +0 -0
output/Automata/fsm.pdf +0 -0
output/DT/gt.pdf +0 -0
output/Uncategorized/sjhkdf.docx +0 -0
pdftoimage.py +46 -0
threadedABC.py +102 -0
tk3.py +146 -0
traditionalABC.py +57 -0
try.py +3 -0

Documentation.ipynb ADDED Viewed

	@@ -0,0 +1,117 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## ABC_1 DOCUMENTATION\n",
+    "\n",
+    "Dito ko lalagay lahat ng documentation and uses ng mga functions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### pdftoimage.py and docxtoimage.py\n",
+    "\n",
+    "bale parang naging utils file na lang toh kasi ginawa ko siya na pwede tawagin sa ibang python file nilagay ko na siya sa abc_1.py pareho and pinagsama ko sa isang function na pangalan convert_pages(folder_path,image_output,Max_pages) tas automatically na toh mag iiterate sa isang folder para gumawa ng mga buffer folders na may lamang image na page ng file ganto sample usage nya also si docx to image automatic na png na siya kasi best for ocr naman ang png \n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import abc_1\n",
+    "abc_1.convert_pages('input','png',4) \n",
+    "\n",
+    "    \n",
+    "def convert_pages(folder_path, output_format ,max_pages):\n",
+    "    for root, directories, files in os.walk(folder_path):\n",
+    "        for filename in files:\n",
+    "      # Get the file extension (including the dot)\n",
+    "            extension = os.path.splitext(filename)[1].lower()\n",
+    "            if extension=='.pdf':\n",
+    "                pdftoimage.convert_pdfs(folder_path, output_format,max_pages)\n",
+    "            if extension=='.docx':\n",
+    "                docxtoimage.process(folder_path,max_pages)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### inputPDFToOutputOCR\n",
+    "\n",
+    "ito yung parang augmentA.py na for text. ginawa kong very similar yung parameters ng pagtakbo nya except dagdag lang ng konti kasi may subfolder toh. ito pinaka challenging na part sa buong project kasi kailangan match pa din yung categories_dict para di na kayo mahirapan mag bago bago ng formats and flows pero syempre ako na toh kaya possible. bale ginamit ko mga functions inside the abc_1 na den to categorize kaya walang issue yan with functions and outputs kasi same sila ng fundamental rules for categorization bale same dapat na babato sa file yung json categories like sa augment a then straightforward naman na from there"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "## run muna yung convert_pages for this then delete folder  pag taopos\n",
+    "\n",
+    "if __name__ == '__main__': \n",
+    "    categories_keywords_dict = {\n",
+    "        'AI': ['Artificial', 'Intelligence'],\n",
+    "        'Automata': ['finite', 'state', 'machines'],\n",
+    "        'DT': ['game', 'theory']\n",
+    "    }\n",
+    "\n",
+    "    folder_path = 'input' #output folder ni pdftoimage toh\n",
+    "    folder_output = 'output'  # Fixed typo\n",
+    "    compiled_keywords = abc_1.compile_keywords(categories_keywords_dict)\n",
+    "\n",
+    "    subfolder_names = get_subfolder_names(folder_path)\n",
+    "    runOCR(subfolder_names)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### LIMITATIONS\n",
+    "Discuss ko lang mga potential issues na makakaharap naten\n",
+    "\n",
+    "- file name delikado if may kapareho kasi ang way ng pag generate ng subfolder is file name tas pag mmove na yung file mismo lalagyan lang ng .pdf or .docx\n",
+    "\n",
+    "- next issue yung bilis. tinanggal ko lahat ng concurrency and threading functionalities pag dating sa OCR kasi may potential risk na makasira ng device kasi nga mabigat talaga sobra. \n",
+    "\n",
+    "- file path management. di ko alam if gagana yung program natin if nasa labas ng work folder yung ipprocess natin\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### JUSTIFICATIONS\n",
+    "- pwede tayo gumawa ng script na pang error handling ng file with same names\n",
+    "- justify natin na limitations ng machine yung bagal kasi di kaya\n",
+    "- ito idk pano ssolve or baka lang di ko alam pano\n",
+    "\n",
+    "for the integration part pwede nyo na siguro rin simulan and itry para easy na after magawa ng front. pero functionality wise i think kumpleto na tayo "
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

abc_1.py ADDED Viewed

	@@ -0,0 +1,114 @@

+import ocr
+import os
+import threading
+import concurrent.futures
+from multiprocessing import Pool  # Import for multiprocessing
+import re
+from docx import Document  # Assuming DOCX support is desired
+from pdfminer.high_level import extract_text  # Import for PDF text extraction
+import time
+import pdftoimage
+import docxtoimage
+# Additional libraries for new file types
+#import openpyxl  # For basic XLSX handling (consider pandas for structured data)
+#from pptx import Presentation  # For PPTX presentations (install with: pip install python-pptx)
+try:
+    from docx import Document
+except ImportError:
+    print("To enable DOCX support, install python-docx: pip install python-docx")
+class DecodingError(Exception):
+    pass
+def compile_keywords(categories_keywords_dict):
+    """Pre-compiles keyword lists for faster matching"""
+    compiled_keywords = {category: [re.compile(keyword, re.IGNORECASE) for keyword in keywords]
+                         for category, keywords in categories_keywords_dict.items()}
+    return compiled_keywords
+def categorize_text_chunk(text_chunk, compiled_keywords):
+    """Categorizes a chunk of text using compiled keywords"""
+    for category, keyword_list in compiled_keywords.items():
+        if all(keyword.search(text_chunk) for keyword in keyword_list):
+            return category
+    return 'Uncategorized'
+def use_ocr(folder_path): #pag tinawag toh extract niya lahat ng text sa buffer folder
+    all_extracted_text = ""
+    for filename in os.listdir(folder_path):
+        if filename.endswith(".jpg") or filename.endswith(".png"):
+            image_path = os.path.join(folder_path, filename)
+        extracted_text = ocr.extract_text_from_image(image_path)
+        all_extracted_text += "\n".join(extracted_text) + "\n\n"  # Add double newlines for separation
+    return all_extracted_text
+def convert_pages(folder_path, output_format ,max_pages):
+    for root, directories, files in os.walk(folder_path):
+        for filename in files:
+      # Get the file extension (including the dot)
+            extension = os.path.splitext(filename)[1].lower()
+            if extension=='.pdf':
+                pdftoimage.convert_pdfs(folder_path, output_format,max_pages)
+            if extension=='.docx':
+                docxtoimage.process(folder_path,max_pages)
+def categorize_file(file_path, compiled_keywords):
+    try:
+        if file_path.endswith('.pdf'):
+            text = extract_text(file_path)  # Use pdfminer to extract text (CPU-bound)
+            return file_path, categorize_text_chunk(text, compiled_keywords)
+        elif file_path.endswith('.docx') and Document:
+            # ... (code for DOCX files - potentially I/O bound)
+            try:
+                doc = Document(file_path)
+                text = '\n'.join(paragraph.text for paragraph in doc.paragraphs)  # Combine all paragraphs
+                return file_path, categorize_text_chunk(text, compiled_keywords)
+            except Exception as e:
+                print(f"Error processing DOCX '{file_path}': {e}")
+                return file_path, 'Uncategorized (Error)'
+        elif file_path.endswith('.txt'):
+            with open(file_path, 'r') as f:
+                text = f.read()
+            return file_path, categorize_text_chunk(text, compiled_keywords)
+        else:
+            print(f"Unsupported file type: {file_path}")
+            return None, 'Unsupported File Type'
+    except Exception as e:
+        print(f"Error processing '{file_path}': {e}")
+        return file_path, 'Uncategorized (Error)'
+def threaded_worker(file_paths_categories, output_dir):
+    for file_path, category in file_paths_categories:
+        if category is not None:  # Skip unsupported files
+            category_dir = os.path.join(output_dir, category)
+            os.makedirs(category_dir, exist_ok=True)
+            os.rename(file_path, os.path.join(category_dir, os.path.basename(file_path)))
+def multi_process_categorizer(input_dir, output_dir, categories_keywords_dict, num_processes):
+    files = [os.path.join(input_dir, f) for f in os.listdir(input_dir)]
+    # Use multiprocessing pool for CPU-bound text processing
+    with Pool(processes=num_processes) as pool:
+        results = pool.starmap(categorize_file, [(file_path, categories_keywords_dict) for file_path in files])
+    # Use concurrent.futures for potentially I/O-bound tasks like moving files
+    with concurrent.futures.ThreadPoolExecutor() as executor:
+        executor.submit(threaded_worker, results, output_dir)
+def chunks(lst, chunk_size):
+    """Yield successive n-sized chunks from lst."""
+    for i in range(0, len(lst), chunk_size):
+        yield lst[i:i + chunk_size]

augmentA.py ADDED Viewed

	@@ -0,0 +1,26 @@

+import abc_1
+import time
+import sys
+from docx import Document  # Assuming DOCX support is desired
+from pdfminer.high_level import extract_text  # Import for PDF text extraction
+import json
+if __name__ == '__main__':
+    start = time.time()
+    if len(sys.argv) > 1:
+        data = sys.argv[1]
+        categories_keywords_dict = json.loads(data)
+    else:
+        print("No data provided.")
+    categories_keywords_dict1 = {
+        'AI': ['Artificial', 'Intelligence'],
+        'Automata': ['finite', 'state', 'machines'],
+     'DT': ['game', 'theory']
+    }
+    input='input'#file path here
+    output='output'#and here
+    compiled_keywords = abc_1.compile_keywords(categories_keywords_dict1)
+    abc_1.multi_process_categorizer(input, output , compiled_keywords, num_processes=8)  # Adjust processes as needed
+    end = time.time()
+    print(f"Categorization completed in {end - start:.2f} seconds")

best/BEST.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4a5efbfb48b4081100544e75e1e2b57f8de3d84f213004b14b85fd4b3748db17
+size 83152330

docxtoimage.py ADDED Viewed

	@@ -0,0 +1,45 @@

+import os
+from spire.doc import *
+from spire.doc.common import *
+def process(folder_path,max_page):
+    for filename in os.listdir(folder_path):
+        if filename.endswith(".docx"):
+            process_docx(folder_path, filename,max_page)
+def process_docx(folder_path, filename,max_page=None):
+  try:
+    # Construct the full file path
+    file_path = os.path.join(folder_path, filename)
+    # Process the docx file
+    document = Document()
+    document.LoadFromFile(file_path)
+    if max_page>document.GetPageCount():
+        image_streams = document.SaveImageToStreams(0,document.GetPageCount() ,ImageType.Bitmap)
+    else:
+        image_streams = document.SaveImageToStreams(0,max_page ,ImageType.Bitmap)
+    # Extract the filename without extension
+    file_name, _ = os.path.splitext(filename)
+    # Create the folder path to save images
+    image_folder_path = os.path.join(folder_path, file_name)
+    os.makedirs(image_folder_path, exist_ok=True)
+    # Save each image stream to a JPG file
+    for i, image in enumerate(image_streams):
+        image_name = os.path.join(image_folder_path, f"{file_name}_{i+1}.png")
+        with open(image_name, 'wb') as image_file:
+            image_file.write(image.ToArray())
+    document.Close()
+  except Exception as e:
+    print(f"Error processing file {filename}: {e}")
+if __name__ == '__main__':
+    # Define the folder path
+    folder_path = "input"
+    max_page=4
+    process(folder_path,max_page)

dump.py ADDED Viewed

	@@ -0,0 +1,14 @@

+import sys
+def main(data):
+    print("Received data:")
+    print(data)
+    print(type(data))
+if __name__ == "__main__":
+    # Check if data argument is provided
+    if len(sys.argv) > 1:
+        data = sys.argv[1]
+        main(data)
+    else:
+        print("No data provided.")

english_g2/english_g2.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e2272681d9d67a04e2dff396b6e95077bc19001f8f6d3593c307b9852e1c29e8
+size 15143997

inputPDFToOutputOCR.py ADDED Viewed

	@@ -0,0 +1,86 @@

+import os
+import abc_1
+import shutil
+def get_subfolder_names(folder_path):
+    try:
+        subfolders = [f for f in os.listdir(folder_path) if os.path.isdir(os.path.join(folder_path, f))]
+        return subfolders
+    except FileNotFoundError:
+        print(f"Error: Folder not found: {folder_path}")
+        return []
+def create_folder(folder_path):
+    """Creates a folder if it doesn't exist."""
+    if not os.path.exists(folder_path):
+        try:
+            os.makedirs(folder_path)  # Create the folder and any missing parent directories
+        except OSError as e:
+            print(f"Error creating folder {folder_path}: {e}")
+def move_file(source_path, destination_path):
+    """Moves a file from the source to the destination."""
+    try:
+        os.rename(source_path, destination_path)
+    except OSError as e:
+        print(f"Error moving file {source_path} to {destination_path}: {e}")
+def process_file(folder_path, name):
+    """Processes a single file, performing OCR, categorization, and moving."""
+    text = abc_1.use_ocr(os.path.join(folder_path, name))
+    category = abc_1.categorize_text_chunk(text, compiled_keywords)
+    category_folder = os.path.join(folder_output, category)
+    create_folder(category_folder)
+    has_pdf, has_docx= check_file_existence(folder_path,name)
+    if has_pdf:
+        source_file = os.path.join(folder_path, name + '.pdf')
+        destination_file = os.path.join(category_folder, name + '.pdf')
+        move_file(source_file, destination_file)
+        print(f"File '{name}' categorized as '{category}' and moved to '{category_folder}'.")
+    if has_docx:
+        source_file = os.path.join(folder_path, name + '.docx')
+        destination_file = os.path.join(category_folder, name + '.docx')
+        move_file(source_file, destination_file)
+        print(f"File '{name}' categorized as '{category}' and moved to '{category_folder}'.")
+def check_file_existence(folder_path, filename):
+  has_pdf = False
+  has_docx = False
+  for filename_in_folder in os.listdir(folder_path):
+    base_filename, ext = os.path.splitext(filename_in_folder)
+    if base_filename == filename:
+      if ext == '.pdf':
+        has_pdf = True
+      elif ext == '.docx':
+        has_docx = True
+  return has_pdf, has_docx
+def runOCR(subfolder_names):
+    for name in subfolder_names:
+        process_file(folder_path, name)
+        if os.path.exists(folder_path+'/'+name): # buffer folder delete
+            shutil.rmtree(folder_path+'/'+name)
+if __name__ == '__main__':
+    categories_keywords_dict = {
+        'AI': ['Artificial', 'Intelligence'],
+        'Automata': ['finite', 'state', 'machines'],
+        'DT': ['game', 'theory']
+    }
+    folder_path = 'input' #output folder ni pdftoimage toh
+    folder_output = 'output'  # Fixed typo
+    compiled_keywords = abc_1.compile_keywords(categories_keywords_dict)
+    subfolder_names = get_subfolder_names(folder_path)
+    runOCR(subfolder_names)

ocr.py ADDED Viewed

	@@ -0,0 +1,44 @@

+import easyocr
+import cv2
+import os
+def extract_text_from_image(image_path, language='en'):
+  """
+  Extracts text from an image using EasyOCR.
+  Args:
+      image_path (str): Path to the image file.
+      language (str, optional): Language(s) to be recognized. Defaults to 'en' (English).
+  Returns:
+      list: List of recognized text strings.
+  """
+  reader = easyocr.Reader([language])
+  reader.detector = reader.initDetector('best\BEST.pth')
+  image = cv2.imread(image_path)
+  result = reader.readtext(image, detail=0)  # Extract only recognized texts
+  return result
+if __name__ == '__main__':
+  # Define the folder path containing images
+  folder_path = "inference_results\Anil Maheshwari - Data analytics-McGraw-Hill Education (2017)"
+  # Create an empty string to store all concatenated text
+  all_extracted_text = ""
+  # Loop through all files in the folder
+  for filename in os.listdir(folder_path):
+    if filename.endswith(".jpg") or filename.endswith(".png"):
+      image_path = os.path.join(folder_path, filename)
+      # Extract text for current image
+      extracted_text = extract_text_from_image(image_path)
+      # Concatenate extracted text with a newline character
+      all_extracted_text += "\n".join(extracted_text) + "\n\n"  # Add double newlines for separation

output/AI/ai.pdf ADDED Viewed

Binary file (74.4 kB). View file

output/Automata/fsm.pdf ADDED Viewed

Binary file (73.3 kB). View file

output/DT/gt.pdf ADDED Viewed

Binary file (75.3 kB). View file

output/Uncategorized/sjhkdf.docx ADDED Viewed

Binary file (13.3 kB). View file

pdftoimage.py ADDED Viewed

	@@ -0,0 +1,46 @@

+import os
+from pdf2image import convert_from_path
+def convert_pdf_to_images(pdf_path, output_format="png", max_pages=None):
+  """Converts a single PDF file to images.
+  Args:
+      pdf_path (str): Path to the PDF file.
+      output_format (str, optional): Desired output format for images (default: "png").
+          Supported formats are "png", "jpg", and "ppm".
+      max_pages (int, optional): Maximum number of pages to convert (default: None, all pages).
+  """
+  try:
+    pdf_name, _ = os.path.splitext(os.path.basename(pdf_path))  # Extract filename without extension
+    images = convert_from_path(pdf_path, fmt=output_format, first_page=1, last_page=max_pages or None)  # Use None for all pages
+    buffer_folder_path = os.path.join(os.path.dirname(pdf_path), pdf_name)  # Create folder next to the PDF
+    os.makedirs(buffer_folder_path, exist_ok=True)  # Create if not exists
+    for i, image in enumerate(images):
+      image_path = os.path.join(buffer_folder_path, f"page_{i+1}.{output_format}")
+      image.save(image_path, output_format.upper())  # Use uppercase extension
+  except Exception as e:
+    print(f"Error converting {pdf_path}: {e}")
+def convert_pdfs(pdf_folder_path, output_format="png", max_pages=None):
+  """Converts all PDF files in a folder to images sequentially.
+  Args:
+      pdf_folder_path (str): Path to the folder containing PDF files.
+      output_format (str, optional): Desired output format for images (default: "png").
+          Supported formats are "png", "jpg", and "ppm".
+      max_pages (int, optional): Maximum number of pages to convert per PDF (default: None, all pages).
+  """
+  for filename in os.listdir(pdf_folder_path):
+    if filename.endswith(".pdf"):
+      pdf_path = os.path.join(pdf_folder_path, filename)
+      convert_pdf_to_images(pdf_path, output_format, max_pages)
+# Example usage
+#convert_pdfs("input", output_format="png", max_pages=2)  # Convert PDFs to JPG, keeping only the first 2 pages

threadedABC.py ADDED Viewed

	@@ -0,0 +1,102 @@

+import numpy as np
+import time
+import threading
+# Rastrigin function
+def rastrigin_function(x):
+    A = 10
+    return A * len(x) + np.sum(x**2 - A * np.cos(2 * np.pi * x))
+# Initialize control parameters
+SN = 10000  # Number of food sources
+MCN = 100000  # Maximum number of cycles
+limit = 50  # Maximum number of exploitations for a solution
+dimensionality = 2  # Dimensionality of the search space
+# Shared variables among threads
+food_sources_lock = threading.Lock()
+trial_lock = threading.Lock()
+cyc_lock = threading.Lock()
+start_time_lock = threading.Lock()
+food_sources = np.random.uniform(-5.12, 5.12, size=(SN, dimensionality))  # Initial random positions
+trial = np.zeros(SN)  # Initialize trial counters
+cyc = 1  # Initial cycle
+start_time = None  # Start time
+# Function for Employed Bees' Phase
+def employed_bees_phase():
+    global food_sources, trial
+    for i in range(SN):
+        # Generate a neighbor solution
+        x_hat = food_sources[i] + np.random.uniform(-0.5, 0.5, size=(dimensionality,))
+        # Update solution if it is better
+        if rastrigin_function(x_hat) < rastrigin_function(food_sources[i]):
+            with food_sources_lock:
+                food_sources[i] = x_hat
+                trial[i] = 0
+        else:
+            with trial_lock:
+                trial[i] += 1
+# Function for Onlooker Bees' Phase
+def onlooker_bees_phase():
+    global food_sources, trial
+    probabilities = 1 / (1 + np.exp(-trial))  # Use trial as a measure of fitness
+    onlooker_indices = np.random.choice(SN, size=SN, p=probabilities / probabilities.sum())
+    for i in onlooker_indices:
+        # Generate a neighbor solution
+        x_hat = food_sources[i] + np.random.uniform(-0.5, 0.5, size=(dimensionality,))
+        # Update solution if it is better
+        if rastrigin_function(x_hat) < rastrigin_function(food_sources[i]):
+            with food_sources_lock:
+                food_sources[i] = x_hat
+                trial[i] = 0
+        else:
+            with trial_lock:
+                trial[i] += 1
+# Function for Scout Bee Phase
+def scout_bee_phase():
+    global food_sources, trial
+    max_trial_index = np.argmax(trial)
+    if trial[max_trial_index] > limit:
+        with food_sources_lock:
+            food_sources[max_trial_index] = np.random.uniform(-5.12, 5.12, size=(dimensionality,))
+            trial[max_trial_index] = 0
+# Record start time
+with start_time_lock:
+    start_time = time.time()
+# Thread for Employed Bees' Phase
+employed_thread = threading.Thread(target=employed_bees_phase)
+# Thread for Onlooker Bees' Phase
+onlooker_thread = threading.Thread(target=onlooker_bees_phase)
+# Thread for Scout Bee Phase
+scout_thread = threading.Thread(target=scout_bee_phase)
+# Start all threads
+employed_thread.start()
+onlooker_thread.start()
+scout_thread.start()
+# Wait for all threads to finish
+employed_thread.join()
+onlooker_thread.join()
+scout_thread.join()
+# Record end time
+end_time = time.time()
+# Find the best solution
+best_solution = food_sources[np.argmin([rastrigin_function(x) for x in food_sources])]
+print("Best solution:", best_solution)
+print("Objective function value at best solution:", rastrigin_function(best_solution))
+print("Time taken:", end_time - start_time, "seconds")

tk3.py ADDED Viewed

	@@ -0,0 +1,146 @@

+import tkinter as tk
+import tkinter.filedialog as filedialog
+from tkinter import ttk
+import os
+import subprocess
+import json
+keyword_entries = []  # Declare globally
+def browse_folder():  # input in abc.py
+    folder_path = filedialog.askdirectory(
+        initialdir="/",
+        title="Select a Folder"
+    )
+    file_path_var.set(folder_path)
+    # Filter and display PDF and DOCX files
+    for filename in os.listdir(folder_path):
+        if os.path.splitext(filename)[1].lower() in ('.pdf', '.docx'):
+            print(filename)  # Display the filename in the terminal
+def generate_textboxes():
+    global keyword_entries
+    category_data = []
+    # Collect data from dropdowns
+    for category_frame in category_frames:
+        category_name = category_var[category_frame].get()
+        num_keywords = int(keyword_vars[category_frame].get())
+        category_data.append((category_name, num_keywords))
+    # Clear old textboxes
+    clear_existing_textboxes()
+    # Generate new textboxes and save their references
+    keyword_entries.clear()  # Clear before generating new entries
+    for i, (category_name, num_keywords) in enumerate(category_data):
+        label = tk.Label(root, text=f"{category_name}:")
+        label.pack()
+        for _ in range(num_keywords):
+            entry = tk.Entry(root)
+            entry.pack()
+            keyword_entries.append(entry)
+    # Place save button below textboxes
+    save_button = tk.Button(root, text="Categorize", command=save_to_backup)
+    save_button.pack()
+def save_to_backup(): # connect to
+    global keyword_entries
+    category_data = {}
+    # Collect data from dropdowns and textboxes
+    keyword_start_index = 0
+    for i, category_frame in enumerate(category_frames):
+        category_name = category_var[category_frame].get()
+        num_keywords = int(keyword_vars[category_frame].get())
+        keywords = keyword_entries[keyword_start_index:keyword_start_index + num_keywords]
+        category_data[category_name] = [entry.get() for entry in keywords]
+        keyword_start_index += num_keywords
+    #print(category_data)
+    subprocess.run(["python3", "augmentA.py", json.dumps(category_data)])
+def clear_existing_textboxes():
+    for widget in root.winfo_children():
+        if isinstance(widget, tk.Label) or isinstance(widget, tk.Entry):
+            widget.destroy()
+def update_category_dropdowns():
+    # Destroy old category frames
+    for frame in category_frames:
+        frame.destroy()
+    category_frames.clear()
+    # Create new frames
+    num_categories = num_categories_var.get()
+    for i in range(num_categories):
+        frame = tk.Frame(root)
+        frame.pack()
+        category_frames.append(frame)
+        tk.Label(frame, text="Category Name:").pack()
+        category_var[frame] = tk.StringVar(frame)
+        tk.Entry(frame, textvariable=category_var[frame]).pack()
+        tk.Label(frame, text="Number of Keywords:").pack()
+        keyword_vars[frame] = tk.IntVar(frame)
+        keyword_options = [1, 2, 3, 4, 5]
+        ttk.Combobox(frame, textvariable=keyword_vars[frame],
+                     values=keyword_options).pack()
+# --- Main Program ---
+root = tk.Tk()
+root.title("BuzzMatchTester")
+# UI Elements for File Input
+file_frame = tk.Frame(root)  # Frame to hold file path and button
+file_frame.pack()
+file_path_label = tk.Label(file_frame, text="File Path:")
+file_path_label.pack(side='left')
+file_path_var = tk.StringVar(root)
+file_path_entry = tk.Entry(file_frame, textvariable=file_path_var)
+file_path_entry.pack(side='left')
+browse_button = tk.Button(file_frame, text="Browse Folder", command=browse_folder)  # Change browse_file to browse_folder
+browse_button.pack(side='left')
+# Dropdown for Number of Categories
+num_categories_label = tk.Label(root, text="Number of Categories:")
+num_categories_label.pack()
+num_categories_options = [0,1, 2, 3, 4, 5]
+num_categories_var = tk.IntVar(root)
+num_categories_var.set(num_categories_options[0])
+num_categories_dropdown = ttk.Combobox(root, textvariable=num_categories_var,
+                                       values=num_categories_options)
+num_categories_dropdown.pack()
+category_frames = []
+category_var = {}
+keyword_vars = {}
+update_category_dropdowns()  # Initial setup
+# Generate Button
+generate_button = tk.Button(root, text="Generate Textboxes", command=generate_textboxes)
+generate_button.pack()
+num_categories_dropdown.bind("<<ComboboxSelected>>", lambda _: update_category_dropdowns())  # After updating dropdowns
+root.mainloop()

traditionalABC.py ADDED Viewed

	@@ -0,0 +1,57 @@

+import numpy as np
+import time
+# Rastrigin function
+def rastrigin_function(x):
+    A = 10
+    return A * len(x) + np.sum(x**2 - A * np.cos(2 * np.pi * x))
+# Initialize control parameters
+SN = 10000  # Number of food sources
+MCN = 100000  # Maximum number of cycles
+limit = 50  # Maximum number of exploitations for a solution
+dimensionality = 2  # Dimensionality of the search space
+# Shared variables
+food_sources = np.random.uniform(-5.12, 5.12, size=(SN, dimensionality))  # Initial random positions
+trial = np.zeros(SN)  # Initialize trial counters
+# Main ABC loop
+start_time = time.time()
+for cyc in range(1, MCN + 1):
+    # Employed Bees' Phase
+    for i in range(SN):
+        x_hat = food_sources[i] + np.random.uniform(-0.5, 0.5, size=(dimensionality,))
+        if rastrigin_function(x_hat) < rastrigin_function(food_sources[i]):
+            food_sources[i] = x_hat
+            trial[i] = 0
+        else:
+            trial[i] += 1
+    # Onlooker Bees' Phase
+    probabilities = 1 / (1 + np.exp(-trial))
+    onlooker_indices = np.random.choice(SN, size=SN, p=probabilities / probabilities.sum())
+    for i in onlooker_indices:
+        x_hat = food_sources[i] + np.random.uniform(-0.5, 0.5, size=(dimensionality,))
+        if rastrigin_function(x_hat) < rastrigin_function(food_sources[i]):
+            food_sources[i] = x_hat
+            trial[i] = 0
+        else:
+            trial[i] += 1
+    # Scout Bee Phase
+    max_trial_index = np.argmax(trial)
+    if trial[max_trial_index] > limit:
+        food_sources[max_trial_index] = np.random.uniform(-5.12, 5.12, size=(dimensionality,))
+        trial[max_trial_index] = 0
+end_time = time.time()
+# Find the best solution
+best_solution = food_sources[np.argmin([rastrigin_function(x) for x in food_sources])]
+print("Best solution:", best_solution)
+print("Objective function value at best solution:", rastrigin_function(best_solution))
+print("Time taken:", end_time - start_time, "seconds")

try.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ import abc_1
2	+
3	+ abc_1.convert_pages('input','png',4)