# Python script to compute parity of an N-bit binary number using an N-input neural network with L hidden layers, and one output neuron, in PyTorch.
# Customizable parameters:
N = 5  # Default number of input bits (and data bit-width)
L = 2  # Number of hidden layers
hidden_layer_size = 10 # Default number of neurons per hidden layer
epochs = 10000
learning_rate = 0.003
min_loss_threshold = 0.01 #(stops training epochs when loss reaches less than this number)

# In essence, parity is about the total count of 1s, while XOR is a convenient operation to compute the parity bit.


########################

"""
Python script to compute parity of an N-bit binary number using an N-input neural network with L hidden layers, and one output neuron, in PyTorch.

The Python script calculates even parity. The code that determines the parity is within the generate_data function:

parity = sum(bits) % 2  # Even parity

The sum(bits) calculates the number of 1s in the input bit sequence. The modulo operator % 2 returns 0 if the sum is even, and 1 if the sum is odd. This result is assigned to the parity variable. Because parity is set to 1 only when sum(bits) is odd, parity is '1' when number of bits set to 1 is odd, and therefore this is an even parity calculation. This explicitly calculates even parity: the parity bit is set so the total number of 1s (including the parity bit itself) in the sequence becomes even.[1]

Even parity is not about having an odd number of bits set to one; it's about ensuring that the total number of 1s (data bits + parity bit) is even. This is how error detection works with parity bits: the receiver knows they should always see an even number of 1s. If they encounter an odd number, it flags a transmission error.[1] The script itself generates random data with a variable bit-width N and then adds this parity bit to compute labels. This combined sequence (with a length N + 1) is never used in the script itself. It is the job of the Neural Network to discover the parity calculation on its own during the training phase, where only the input bits without the parity bit, and the calculated parity bit, are presented to the NN.

Includes modes for inference and pretraining with explicit gradient computations and backpropagation.  Displays real-time error during training in a popup window.

Inspired by: [1] Aug 28, 2024 Youtube Interview of Juergen Schmidhuber Schmidhuber at youtube =DP454c1K_vQ
See also (seven years ago) True Artificial Intelligence will change everything | Juergen Schmidhuber | TEDxLakeComo www.youtube =-Y7PLaxXUrs

[Jürgen Schmidhuber, the father of generative AI shares his groundbreaking work in deep learning and artificial intelligence. In this exclusive interview, he discusses the history of AI, some of his contributions to the field, and his vision for the future of intelligent machines. Schmidhuber offers unique insights into the exponential growth of technology and the potential impact of AI on humanity and the universe.]

In this interview, Schmidhuber stated that LLMs cannot compute "parity" of bits in a binary number sequence, but that "Recurrent" NNs (RNNs) can compute "parity".   I wanted to know whether a simple feed forward NNs can compute "parity".   (If so, perhaps LLMs actually can compute "parity" if specifically trained to do so)

Method of the python script informed by:
    Create a Basic Neural Network Model - Deep Learning with PyTorch 5 - YouTube =JHWqWIoac2I (2023-06-05)
and
    Building a Neural Network with PyTorch in 15 Minutes  youtube =mozBidd58VQ

Loss, as defined, drops smothly and with some bumps visible in the graph, depending on the "random" weights preloaded into the model each run. 
Example results:
   Epoch [1000/1000], Loss: 0.0395
   Test Accuracy: 1.0000
[Once, model loss fell to 0.0642 after 2000 epochs and did not fall below .0637 after 9999 epochs.  The final sample-tested prediction Accuracy: 0.9700]
 This shows that the model typically does not need to run the loss all the way down to loss a less than .01 because the margins at the final neuron are earlier greater than .5 between zero/one logit values.   The model script is not specifically written to optimize these margins.  The margins just emerged.

done loading libraries
Epoch [100/1000], Loss: 0.6247
Epoch [200/1000], Loss: 0.4112
Epoch [300/1000], Loss: 0.1990
Epoch [400/1000], Loss: 0.0849
Epoch [500/1000], Loss: 0.0413
Epoch [600/1000], Loss: 0.0250
Epoch [700/1000], Loss: 0.0172
Epoch [800/1000], Loss: 0.0128
Epoch [900/1000], Loss: 0.0100
Epoch [1000/1000], Loss: 0.0081

Test Accuracy: 1.0000
Predictions tensor([1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0.,
        0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 0., 0.,
        0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1.,
        0., 1., 0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
        0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1.,
        0., 1., 0., 1., 0., 0., 0., 1., 0., 0.])
Labels tensor([1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0.,
        0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 0., 0.,
        0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1.,
        0., 1., 0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
        0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1.,
        0., 1., 0., 1., 0., 0., 0., 1., 0., 0.])


The line that calculates test_outputs is:
			test_outputs = model(test_data)

This line uses the model object (an instance of the ParityNet class) and calls its forward method. The code is equivalent to test_outputs = model.forward(test_data). In this line test_data represents a tensor with all the parity bit sequences to check.

The number of neurons and activations that directly contribute to generating test_outputs depends on the network architecture defined by N, L, and hidden_layer_size.

Output Layer: The final layer has one output neuron (because we are predicting a single binary value - the parity). This neuron uses a Sigmoid activation function.

Last Hidden Layer: This layer has hidden_layer_size neurons, each using a ReLU activation function. The output of each neuron in this layer feeds directly into the single output neuron.

Previous Hidden Layers: If L > 1, there are L-1 previous hidden layers, each also with hidden_layer_size neurons and ReLU activations. The activations of each layer feed into the next.

Input Layer: The input layer consists of N nodes which represent the input bits, and are directly connected to the first hidden layer. We might consider that a linear activation function is applied to such input layer.

So, to generate test_outputs, you have the following activations:

hidden_layer_size * L ReLU activations in the hidden layers.

1 Sigmoid activation in the output layer.

N linear activations in the input layer (optional).

In summary, hidden_layer_size neurons and ReLU activations in the last hidden layer and 1 neuron with a Sigmoid activation in the output layer immediately generate the test_outputs values. All the other (L-1) * hidden_layer_size neurons with ReLU activations in the preceding hidden layers indirectly contribute by feeding into the last hidden layer. Each of the input bits are treated individually via the N nodes of the input layer, that might be considered having a linear activation or no activation at all.

"""
########################

print("load libraries")
print("import torch")
import torch
print("import torch.nn as nn")
import torch.nn as nn
print("import numpy as np")
import numpy as np
print("import matplotlib.pyplot as plt  # For the popup error plot")
import matplotlib.pyplot as plt  # For the popup error plot
print("import random")
import random
print("done loading libraries")

# Generate training data (parity bit is XOR or all data bits)
def generate_data(num_samples, num_bits):
    data = []
    labels = []
    for _ in range(num_samples):
        bits = [random.randint(0, 1) for _ in range(num_bits)]
        parity = sum(bits) % 2  # Even parity: The modulo operator % 2 returns 0 if the sum is even, and 1 if the sum is odd. 
        data.append(bits)
        labels.append(parity)

    return torch.tensor(data, dtype=torch.float32), torch.tensor(labels, dtype=torch.float32).reshape(-1, 1)

train_data, train_labels = generate_data(1000, N) # 1000 N-bit numbers generated for training data.
test_data, test_labels = generate_data(100, N) # 100 N-bit numbers generated for test data.


# Define the neural network model
class ParityNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_hidden_layers, output_size):
        super(ParityNet, self).__init__()
        layers = []
        layers.append(nn.Linear(input_size, hidden_size))
        layers.append(nn.ReLU()) # Activation function
        for _ in range(num_hidden_layers - 1):
            layers.append(nn.Linear(hidden_size, hidden_size))
            layers.append(nn.ReLU()) # Activation function
        layers.append(nn.Linear(hidden_size, output_size))
        layers.append(nn.Sigmoid()) # Output layer activation function for binary classification
        self.layers = nn.Sequential(*layers)


    def forward(self, x):
        return self.layers(x)


# Create the model instance
model = ParityNet(N, hidden_layer_size, L, 1)

# Loss function and optimizer
criterion = nn.BCELoss() # Binary Cross Entropy Loss
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Pretraining loop
losses = [] # Store losses for plotting
plt.ion()  # Turn on interactive plotting
fig, ax = plt.subplots() # Create plot objects outside the loop

######################## TRAINING ########################
for epoch in range(epochs):
    # Forward pass
    outputs = model(train_data)
    loss = criterion(outputs, train_labels)

    # Explicit gradient computation and backpropagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()


    losses.append(loss.item())

    ################ ADD:  If loss.item() < (min_loss_threshold +0.01) Then, save the failed train_data values to failed_train_data[] and append to failed_train_data.txt   Thus, before the end of training, grab the problem bit values and accumulate these difficult training cases in an external file.  In future versions, use these hardest-to-train values to somehow extra-train the model.  ################

    if loss.item() < min_loss_threshold:
        print(f"Reached minimum loss threshold of {min_loss_threshold} at epoch {epoch+1}. Stopping training.")
        break  # Exit the training loop

    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

        # Update the plot in real time
        ax.clear()  # Clear previous plot
        ax.plot(losses) # plot loss history
        ax.set_title("Training Loss")
        ax.set_xlabel("Epoch (x 100)")
        ax.set_ylabel("Loss")
        plt.draw()
        plt.pause(0.01)  # Brief pause for plot to update


plt.ioff() # Turn off interactive mode so plot persists after training is done
plt.show() # show final plot.

######################## INFERENCE-TESTING ########################
# Inference (Testing)
with torch.no_grad():
    test_outputs = model(test_data)
    predicted = (test_outputs > 0.5).float() # Convert probabilities to binary predictions (0 or 1)

    accuracy = (predicted == test_labels).sum() / len(test_labels)
    print(f'Test Accuracy: {accuracy:.4f}')
    print("Predictions", predicted.flatten())

    print("Labels     ", test_labels.flatten())

    # Separate margins for predictions of 1 and 0
    margins_ones = test_outputs[predicted == 1] - 0.5
    margins_zeros =  0.5 - test_outputs[predicted == 0] 


    # Calculate and print statistics for margins of 1s
    if margins_ones.numel() > 0:  # Check if there are any predictions of 1
        min_margin_ones = margins_ones.min().item()
        max_margin_ones = margins_ones.max().item()
        avg_margin_ones = margins_ones.mean().item()

        print(f"Min Margin (Ones): {min_margin_ones:.2f}")
        print(f"Max Margin (Ones): {max_margin_ones:.2f}")
        print(f"Avg Margin (Ones): {avg_margin_ones:.2f}")
        print("Margins (Ones):", margins_ones.flatten().numpy())
    else:
        print("No predictions of 1 in the test dataset.")

    # Calculate and print statistics for margins of 0s
    if margins_zeros.numel() > 0:  # Check if there are any predictions of 0

        min_margin_zeros = margins_zeros.min().item()
        max_margin_zeros = margins_zeros.max().item()
        avg_margin_zeros = margins_zeros.mean().item()

        print(f"Min Margin (Zeros): {min_margin_zeros:.2f}")
        print(f"Max Margin (Zeros): {max_margin_zeros:.2f}")
        print(f"Avg Margin (Zeros): {avg_margin_zeros:.2f}")
        print("Margins (Zeros):", margins_zeros.flatten().numpy())

    else:
        print("No predictions of 0 in the test dataset.")
########################  ADD EXPORT WORKING-MODEL WEIGHTS TO PARTIY-[hyperparameters]NN_weights.bin HERE ########################

########################  ADD A USER-INPUT (BINARY SEQUENCE) MODE HERE ########################