# Python script to compute parity of an N-bit binary number using an N-input neural network with L hidden layers, and one output neuron, in PyTorch. # Customizable parameters: N = 5 # Default number of input bits (and data bit-width) L = 2 # Number of hidden layers hidden_layer_size = 10 # Default number of neurons per hidden layer epochs = 10000 learning_rate = 0.003 min_loss_threshold = 0.01 #(stops training epochs when loss reaches less than this number) # In essence, parity is about the total count of 1s, while XOR is a convenient operation to compute the parity bit. ######################## """ Python script to compute parity of an N-bit binary number using an N-input neural network with L hidden layers, and one output neuron, in PyTorch. The Python script calculates even parity. The code that determines the parity is within the generate_data function: parity = sum(bits) % 2 # Even parity The sum(bits) calculates the number of 1s in the input bit sequence. The modulo operator % 2 returns 0 if the sum is even, and 1 if the sum is odd. This result is assigned to the parity variable. Because parity is set to 1 only when sum(bits) is odd, parity is '1' when number of bits set to 1 is odd, and therefore this is an even parity calculation. This explicitly calculates even parity: the parity bit is set so the total number of 1s (including the parity bit itself) in the sequence becomes even.[1] Even parity is not about having an odd number of bits set to one; it's about ensuring that the total number of 1s (data bits + parity bit) is even. This is how error detection works with parity bits: the receiver knows they should always see an even number of 1s. If they encounter an odd number, it flags a transmission error.[1] The script itself generates random data with a variable bit-width N and then adds this parity bit to compute labels. This combined sequence (with a length N + 1) is never used in the script itself. It is the job of the Neural Network to discover the parity calculation on its own during the training phase, where only the input bits without the parity bit, and the calculated parity bit, are presented to the NN. Includes modes for inference and pretraining with explicit gradient computations and backpropagation. Displays real-time error during training in a popup window. Inspired by: [1] Aug 28, 2024 Youtube Interview of Juergen Schmidhuber Schmidhuber at youtube =DP454c1K_vQ See also (seven years ago) True Artificial Intelligence will change everything | Juergen Schmidhuber | TEDxLakeComo www.youtube =-Y7PLaxXUrs [Jürgen Schmidhuber, the father of generative AI shares his groundbreaking work in deep learning and artificial intelligence. In this exclusive interview, he discusses the history of AI, some of his contributions to the field, and his vision for the future of intelligent machines. Schmidhuber offers unique insights into the exponential growth of technology and the potential impact of AI on humanity and the universe.] In this interview, Schmidhuber stated that LLMs cannot compute "parity" of bits in a binary number sequence, but that "Recurrent" NNs (RNNs) can compute "parity". I wanted to know whether a simple feed forward NNs can compute "parity". (If so, perhaps LLMs actually can compute "parity" if specifically trained to do so) Method of the python script informed by: Create a Basic Neural Network Model - Deep Learning with PyTorch 5 - YouTube =JHWqWIoac2I (2023-06-05) and Building a Neural Network with PyTorch in 15 Minutes youtube =mozBidd58VQ Loss, as defined, drops smothly and with some bumps visible in the graph, depending on the "random" weights preloaded into the model each run. Example results: Epoch [1000/1000], Loss: 0.0395 Test Accuracy: 1.0000 [Once, model loss fell to 0.0642 after 2000 epochs and did not fall below .0637 after 9999 epochs. The final sample-tested prediction Accuracy: 0.9700] This shows that the model typically does not need to run the loss all the way down to loss a less than .01 because the margins at the final neuron are earlier greater than .5 between zero/one logit values. The model script is not specifically written to optimize these margins. The margins just emerged. done loading libraries Epoch [100/1000], Loss: 0.6247 Epoch [200/1000], Loss: 0.4112 Epoch [300/1000], Loss: 0.1990 Epoch [400/1000], Loss: 0.0849 Epoch [500/1000], Loss: 0.0413 Epoch [600/1000], Loss: 0.0250 Epoch [700/1000], Loss: 0.0172 Epoch [800/1000], Loss: 0.0128 Epoch [900/1000], Loss: 0.0100 Epoch [1000/1000], Loss: 0.0081 Test Accuracy: 1.0000 Predictions tensor([1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., 0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 0., 0., 0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 1., 0., 0.]) Labels tensor([1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., 0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 0., 0., 0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 0., 1., 0., 1., 0., 0., 0., 1., 0., 0.]) The line that calculates test_outputs is: test_outputs = model(test_data) This line uses the model object (an instance of the ParityNet class) and calls its forward method. The code is equivalent to test_outputs = model.forward(test_data). In this line test_data represents a tensor with all the parity bit sequences to check. The number of neurons and activations that directly contribute to generating test_outputs depends on the network architecture defined by N, L, and hidden_layer_size. Output Layer: The final layer has one output neuron (because we are predicting a single binary value - the parity). This neuron uses a Sigmoid activation function. Last Hidden Layer: This layer has hidden_layer_size neurons, each using a ReLU activation function. The output of each neuron in this layer feeds directly into the single output neuron. Previous Hidden Layers: If L > 1, there are L-1 previous hidden layers, each also with hidden_layer_size neurons and ReLU activations. The activations of each layer feed into the next. Input Layer: The input layer consists of N nodes which represent the input bits, and are directly connected to the first hidden layer. We might consider that a linear activation function is applied to such input layer. So, to generate test_outputs, you have the following activations: hidden_layer_size * L ReLU activations in the hidden layers. 1 Sigmoid activation in the output layer. N linear activations in the input layer (optional). In summary, hidden_layer_size neurons and ReLU activations in the last hidden layer and 1 neuron with a Sigmoid activation in the output layer immediately generate the test_outputs values. All the other (L-1) * hidden_layer_size neurons with ReLU activations in the preceding hidden layers indirectly contribute by feeding into the last hidden layer. Each of the input bits are treated individually via the N nodes of the input layer, that might be considered having a linear activation or no activation at all. """ ######################## print("load libraries") print("import torch") import torch print("import torch.nn as nn") import torch.nn as nn print("import numpy as np") import numpy as np print("import matplotlib.pyplot as plt # For the popup error plot") import matplotlib.pyplot as plt # For the popup error plot print("import random") import random print("done loading libraries") # Generate training data (parity bit is XOR or all data bits) def generate_data(num_samples, num_bits): data = [] labels = [] for _ in range(num_samples): bits = [random.randint(0, 1) for _ in range(num_bits)] parity = sum(bits) % 2 # Even parity: The modulo operator % 2 returns 0 if the sum is even, and 1 if the sum is odd. data.append(bits) labels.append(parity) return torch.tensor(data, dtype=torch.float32), torch.tensor(labels, dtype=torch.float32).reshape(-1, 1) train_data, train_labels = generate_data(1000, N) # 1000 N-bit numbers generated for training data. test_data, test_labels = generate_data(100, N) # 100 N-bit numbers generated for test data. # Define the neural network model class ParityNet(nn.Module): def __init__(self, input_size, hidden_size, num_hidden_layers, output_size): super(ParityNet, self).__init__() layers = [] layers.append(nn.Linear(input_size, hidden_size)) layers.append(nn.ReLU()) # Activation function for _ in range(num_hidden_layers - 1): layers.append(nn.Linear(hidden_size, hidden_size)) layers.append(nn.ReLU()) # Activation function layers.append(nn.Linear(hidden_size, output_size)) layers.append(nn.Sigmoid()) # Output layer activation function for binary classification self.layers = nn.Sequential(*layers) def forward(self, x): return self.layers(x) # Create the model instance model = ParityNet(N, hidden_layer_size, L, 1) # Loss function and optimizer criterion = nn.BCELoss() # Binary Cross Entropy Loss optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate) # Pretraining loop losses = [] # Store losses for plotting plt.ion() # Turn on interactive plotting fig, ax = plt.subplots() # Create plot objects outside the loop ######################## TRAINING ######################## for epoch in range(epochs): # Forward pass outputs = model(train_data) loss = criterion(outputs, train_labels) # Explicit gradient computation and backpropagation optimizer.zero_grad() loss.backward() optimizer.step() losses.append(loss.item()) ################ ADD: If loss.item() < (min_loss_threshold +0.01) Then, save the failed train_data values to failed_train_data[] and append to failed_train_data.txt Thus, before the end of training, grab the problem bit values and accumulate these difficult training cases in an external file. In future versions, use these hardest-to-train values to somehow extra-train the model. ################ if loss.item() < min_loss_threshold: print(f"Reached minimum loss threshold of {min_loss_threshold} at epoch {epoch+1}. Stopping training.") break # Exit the training loop if (epoch + 1) % 100 == 0: print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}') # Update the plot in real time ax.clear() # Clear previous plot ax.plot(losses) # plot loss history ax.set_title("Training Loss") ax.set_xlabel("Epoch (x 100)") ax.set_ylabel("Loss") plt.draw() plt.pause(0.01) # Brief pause for plot to update plt.ioff() # Turn off interactive mode so plot persists after training is done plt.show() # show final plot. ######################## INFERENCE-TESTING ######################## # Inference (Testing) with torch.no_grad(): test_outputs = model(test_data) predicted = (test_outputs > 0.5).float() # Convert probabilities to binary predictions (0 or 1) accuracy = (predicted == test_labels).sum() / len(test_labels) print(f'Test Accuracy: {accuracy:.4f}') print("Predictions", predicted.flatten()) print("Labels ", test_labels.flatten()) # Separate margins for predictions of 1 and 0 margins_ones = test_outputs[predicted == 1] - 0.5 margins_zeros = 0.5 - test_outputs[predicted == 0] # Calculate and print statistics for margins of 1s if margins_ones.numel() > 0: # Check if there are any predictions of 1 min_margin_ones = margins_ones.min().item() max_margin_ones = margins_ones.max().item() avg_margin_ones = margins_ones.mean().item() print(f"Min Margin (Ones): {min_margin_ones:.2f}") print(f"Max Margin (Ones): {max_margin_ones:.2f}") print(f"Avg Margin (Ones): {avg_margin_ones:.2f}") print("Margins (Ones):", margins_ones.flatten().numpy()) else: print("No predictions of 1 in the test dataset.") # Calculate and print statistics for margins of 0s if margins_zeros.numel() > 0: # Check if there are any predictions of 0 min_margin_zeros = margins_zeros.min().item() max_margin_zeros = margins_zeros.max().item() avg_margin_zeros = margins_zeros.mean().item() print(f"Min Margin (Zeros): {min_margin_zeros:.2f}") print(f"Max Margin (Zeros): {max_margin_zeros:.2f}") print(f"Avg Margin (Zeros): {avg_margin_zeros:.2f}") print("Margins (Zeros):", margins_zeros.flatten().numpy()) else: print("No predictions of 0 in the test dataset.") ######################## ADD EXPORT WORKING-MODEL WEIGHTS TO PARTIY-[hyperparameters]NN_weights.bin HERE ######################## ######################## ADD A USER-INPUT (BINARY SEQUENCE) MODE HERE ########################