MartialTerran
commited on
Create PARITY-calculatingNN_Schmidhuber1.py
Browse files
PARITY-calculatingNN_Schmidhuber1.py
ADDED
@@ -0,0 +1,244 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Python script to compute parity of an N-bit binary number using an N-input neural network with L hidden layers, and one output neuron, in PyTorch.
|
2 |
+
# Customizable parameters:
|
3 |
+
N = 5 # Default number of input bits (and data bit-width)
|
4 |
+
L = 2 # Number of hidden layers
|
5 |
+
hidden_layer_size = 10 # Default number of neurons per hidden layer
|
6 |
+
epochs = 10000
|
7 |
+
learning_rate = 0.003
|
8 |
+
min_loss_threshold = 0.01 #(stops training epochs when loss reaches less than this number)
|
9 |
+
|
10 |
+
# In essence, parity is about the total count of 1s, while XOR is a convenient operation to compute the parity bit.
|
11 |
+
|
12 |
+
|
13 |
+
########################
|
14 |
+
|
15 |
+
"""
|
16 |
+
Python script to compute parity of an N-bit binary number using an N-input neural network with L hidden layers, and one output neuron, in PyTorch.
|
17 |
+
|
18 |
+
The Python script calculates even parity. The code that determines the parity is within the generate_data function:
|
19 |
+
|
20 |
+
parity = sum(bits) % 2 # Even parity
|
21 |
+
|
22 |
+
The sum(bits) calculates the number of 1s in the input bit sequence. The modulo operator % 2 returns 0 if the sum is even, and 1 if the sum is odd. This result is assigned to the parity variable. Because parity is set to 1 only when sum(bits) is odd, parity is '1' when number of bits set to 1 is odd, and therefore this is an even parity calculation. This explicitly calculates even parity: the parity bit is set so the total number of 1s (including the parity bit itself) in the sequence becomes even.[1]
|
23 |
+
|
24 |
+
Even parity is not about having an odd number of bits set to one; it's about ensuring that the total number of 1s (data bits + parity bit) is even. This is how error detection works with parity bits: the receiver knows they should always see an even number of 1s. If they encounter an odd number, it flags a transmission error.[1] The script itself generates random data with a variable bit-width N and then adds this parity bit to compute labels. This combined sequence (with a length N + 1) is never used in the script itself. It is the job of the Neural Network to discover the parity calculation on its own during the training phase, where only the input bits without the parity bit, and the calculated parity bit, are presented to the NN.
|
25 |
+
|
26 |
+
Includes modes for inference and pretraining with explicit gradient computations and backpropagation. Displays real-time error during training in a popup window.
|
27 |
+
|
28 |
+
Inspired by: [1] Aug 28, 2024 Youtube Interview of Juergen Schmidhuber Schmidhuber at youtube =DP454c1K_vQ
|
29 |
+
See also (seven years ago) True Artificial Intelligence will change everything | Juergen Schmidhuber | TEDxLakeComo www.youtube =-Y7PLaxXUrs
|
30 |
+
|
31 |
+
[Jürgen Schmidhuber, the father of generative AI shares his groundbreaking work in deep learning and artificial intelligence. In this exclusive interview, he discusses the history of AI, some of his contributions to the field, and his vision for the future of intelligent machines. Schmidhuber offers unique insights into the exponential growth of technology and the potential impact of AI on humanity and the universe.]
|
32 |
+
|
33 |
+
In this interview, Schmidhuber stated that LLMs cannot compute "parity" of bits in a binary number sequence, but that "Recurrent" NNs (RNNs) can compute "parity". I wanted to know whether a simple feed forward NNs can compute "parity". (If so, perhaps LLMs actually can compute "parity" if specifically trained to do so)
|
34 |
+
|
35 |
+
Method of the python script informed by:
|
36 |
+
Create a Basic Neural Network Model - Deep Learning with PyTorch 5 - YouTube =JHWqWIoac2I (2023-06-05)
|
37 |
+
and
|
38 |
+
Building a Neural Network with PyTorch in 15 Minutes youtube =mozBidd58VQ
|
39 |
+
|
40 |
+
Loss, as defined, drops smothly and with some bumps visible in the graph, depending on the "random" weights preloaded into the model each run.
|
41 |
+
Example results:
|
42 |
+
Epoch [1000/1000], Loss: 0.0395
|
43 |
+
Test Accuracy: 1.0000
|
44 |
+
[Once, model loss fell to 0.0642 after 2000 epochs and did not fall below .0637 after 9999 epochs. The final sample-tested prediction Accuracy: 0.9700]
|
45 |
+
This shows that the model typically does not need to run the loss all the way down to loss a less than .01 because the margins at the final neuron are earlier greater than .5 between zero/one logit values. The model script is not specifically written to optimize these margins. The margins just emerged.
|
46 |
+
|
47 |
+
done loading libraries
|
48 |
+
Epoch [100/1000], Loss: 0.6247
|
49 |
+
Epoch [200/1000], Loss: 0.4112
|
50 |
+
Epoch [300/1000], Loss: 0.1990
|
51 |
+
Epoch [400/1000], Loss: 0.0849
|
52 |
+
Epoch [500/1000], Loss: 0.0413
|
53 |
+
Epoch [600/1000], Loss: 0.0250
|
54 |
+
Epoch [700/1000], Loss: 0.0172
|
55 |
+
Epoch [800/1000], Loss: 0.0128
|
56 |
+
Epoch [900/1000], Loss: 0.0100
|
57 |
+
Epoch [1000/1000], Loss: 0.0081
|
58 |
+
|
59 |
+
Test Accuracy: 1.0000
|
60 |
+
Predictions tensor([1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0.,
|
61 |
+
0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 0., 0.,
|
62 |
+
0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1.,
|
63 |
+
0., 1., 0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
|
64 |
+
0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1.,
|
65 |
+
0., 1., 0., 1., 0., 0., 0., 1., 0., 0.])
|
66 |
+
Labels tensor([1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0.,
|
67 |
+
0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 0., 0.,
|
68 |
+
0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1.,
|
69 |
+
0., 1., 0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
|
70 |
+
0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1.,
|
71 |
+
0., 1., 0., 1., 0., 0., 0., 1., 0., 0.])
|
72 |
+
|
73 |
+
|
74 |
+
The line that calculates test_outputs is:
|
75 |
+
test_outputs = model(test_data)
|
76 |
+
|
77 |
+
This line uses the model object (an instance of the ParityNet class) and calls its forward method. The code is equivalent to test_outputs = model.forward(test_data). In this line test_data represents a tensor with all the parity bit sequences to check.
|
78 |
+
|
79 |
+
The number of neurons and activations that directly contribute to generating test_outputs depends on the network architecture defined by N, L, and hidden_layer_size.
|
80 |
+
|
81 |
+
Output Layer: The final layer has one output neuron (because we are predicting a single binary value - the parity). This neuron uses a Sigmoid activation function.
|
82 |
+
|
83 |
+
Last Hidden Layer: This layer has hidden_layer_size neurons, each using a ReLU activation function. The output of each neuron in this layer feeds directly into the single output neuron.
|
84 |
+
|
85 |
+
Previous Hidden Layers: If L > 1, there are L-1 previous hidden layers, each also with hidden_layer_size neurons and ReLU activations. The activations of each layer feed into the next.
|
86 |
+
|
87 |
+
Input Layer: The input layer consists of N nodes which represent the input bits, and are directly connected to the first hidden layer. We might consider that a linear activation function is applied to such input layer.
|
88 |
+
|
89 |
+
So, to generate test_outputs, you have the following activations:
|
90 |
+
|
91 |
+
hidden_layer_size * L ReLU activations in the hidden layers.
|
92 |
+
|
93 |
+
1 Sigmoid activation in the output layer.
|
94 |
+
|
95 |
+
N linear activations in the input layer (optional).
|
96 |
+
|
97 |
+
In summary, hidden_layer_size neurons and ReLU activations in the last hidden layer and 1 neuron with a Sigmoid activation in the output layer immediately generate the test_outputs values. All the other (L-1) * hidden_layer_size neurons with ReLU activations in the preceding hidden layers indirectly contribute by feeding into the last hidden layer. Each of the input bits are treated individually via the N nodes of the input layer, that might be considered having a linear activation or no activation at all.
|
98 |
+
|
99 |
+
"""
|
100 |
+
########################
|
101 |
+
|
102 |
+
print("load libraries")
|
103 |
+
print("import torch")
|
104 |
+
import torch
|
105 |
+
print("import torch.nn as nn")
|
106 |
+
import torch.nn as nn
|
107 |
+
print("import numpy as np")
|
108 |
+
import numpy as np
|
109 |
+
print("import matplotlib.pyplot as plt # For the popup error plot")
|
110 |
+
import matplotlib.pyplot as plt # For the popup error plot
|
111 |
+
print("import random")
|
112 |
+
import random
|
113 |
+
print("done loading libraries")
|
114 |
+
|
115 |
+
# Generate training data (parity bit is XOR or all data bits)
|
116 |
+
def generate_data(num_samples, num_bits):
|
117 |
+
data = []
|
118 |
+
labels = []
|
119 |
+
for _ in range(num_samples):
|
120 |
+
bits = [random.randint(0, 1) for _ in range(num_bits)]
|
121 |
+
parity = sum(bits) % 2 # Even parity: The modulo operator % 2 returns 0 if the sum is even, and 1 if the sum is odd.
|
122 |
+
data.append(bits)
|
123 |
+
labels.append(parity)
|
124 |
+
|
125 |
+
return torch.tensor(data, dtype=torch.float32), torch.tensor(labels, dtype=torch.float32).reshape(-1, 1)
|
126 |
+
|
127 |
+
train_data, train_labels = generate_data(1000, N) # 1000 N-bit numbers generated for training data.
|
128 |
+
test_data, test_labels = generate_data(100, N) # 100 N-bit numbers generated for test data.
|
129 |
+
|
130 |
+
|
131 |
+
# Define the neural network model
|
132 |
+
class ParityNet(nn.Module):
|
133 |
+
def __init__(self, input_size, hidden_size, num_hidden_layers, output_size):
|
134 |
+
super(ParityNet, self).__init__()
|
135 |
+
layers = []
|
136 |
+
layers.append(nn.Linear(input_size, hidden_size))
|
137 |
+
layers.append(nn.ReLU()) # Activation function
|
138 |
+
for _ in range(num_hidden_layers - 1):
|
139 |
+
layers.append(nn.Linear(hidden_size, hidden_size))
|
140 |
+
layers.append(nn.ReLU()) # Activation function
|
141 |
+
layers.append(nn.Linear(hidden_size, output_size))
|
142 |
+
layers.append(nn.Sigmoid()) # Output layer activation function for binary classification
|
143 |
+
self.layers = nn.Sequential(*layers)
|
144 |
+
|
145 |
+
|
146 |
+
def forward(self, x):
|
147 |
+
return self.layers(x)
|
148 |
+
|
149 |
+
|
150 |
+
# Create the model instance
|
151 |
+
model = ParityNet(N, hidden_layer_size, L, 1)
|
152 |
+
|
153 |
+
# Loss function and optimizer
|
154 |
+
criterion = nn.BCELoss() # Binary Cross Entropy Loss
|
155 |
+
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
|
156 |
+
|
157 |
+
# Pretraining loop
|
158 |
+
losses = [] # Store losses for plotting
|
159 |
+
plt.ion() # Turn on interactive plotting
|
160 |
+
fig, ax = plt.subplots() # Create plot objects outside the loop
|
161 |
+
|
162 |
+
######################## TRAINING ########################
|
163 |
+
for epoch in range(epochs):
|
164 |
+
# Forward pass
|
165 |
+
outputs = model(train_data)
|
166 |
+
loss = criterion(outputs, train_labels)
|
167 |
+
|
168 |
+
# Explicit gradient computation and backpropagation
|
169 |
+
optimizer.zero_grad()
|
170 |
+
loss.backward()
|
171 |
+
optimizer.step()
|
172 |
+
|
173 |
+
|
174 |
+
losses.append(loss.item())
|
175 |
+
|
176 |
+
################ ADD: If loss.item() < (min_loss_threshold +0.01) Then, save the failed train_data values to failed_train_data[] and append to failed_train_data.txt Thus, before the end of training, grab the problem bit values and accumulate these difficult training cases in an external file. In future versions, use these hardest-to-train values to somehow extra-train the model. ################
|
177 |
+
|
178 |
+
if loss.item() < min_loss_threshold:
|
179 |
+
print(f"Reached minimum loss threshold of {min_loss_threshold} at epoch {epoch+1}. Stopping training.")
|
180 |
+
break # Exit the training loop
|
181 |
+
|
182 |
+
if (epoch + 1) % 100 == 0:
|
183 |
+
print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')
|
184 |
+
|
185 |
+
# Update the plot in real time
|
186 |
+
ax.clear() # Clear previous plot
|
187 |
+
ax.plot(losses) # plot loss history
|
188 |
+
ax.set_title("Training Loss")
|
189 |
+
ax.set_xlabel("Epoch (x 100)")
|
190 |
+
ax.set_ylabel("Loss")
|
191 |
+
plt.draw()
|
192 |
+
plt.pause(0.01) # Brief pause for plot to update
|
193 |
+
|
194 |
+
|
195 |
+
plt.ioff() # Turn off interactive mode so plot persists after training is done
|
196 |
+
plt.show() # show final plot.
|
197 |
+
|
198 |
+
######################## INFERENCE-TESTING ########################
|
199 |
+
# Inference (Testing)
|
200 |
+
with torch.no_grad():
|
201 |
+
test_outputs = model(test_data)
|
202 |
+
predicted = (test_outputs > 0.5).float() # Convert probabilities to binary predictions (0 or 1)
|
203 |
+
|
204 |
+
accuracy = (predicted == test_labels).sum() / len(test_labels)
|
205 |
+
print(f'Test Accuracy: {accuracy:.4f}')
|
206 |
+
print("Predictions", predicted.flatten())
|
207 |
+
|
208 |
+
print("Labels ", test_labels.flatten())
|
209 |
+
|
210 |
+
# Separate margins for predictions of 1 and 0
|
211 |
+
margins_ones = test_outputs[predicted == 1] - 0.5
|
212 |
+
margins_zeros = 0.5 - test_outputs[predicted == 0]
|
213 |
+
|
214 |
+
|
215 |
+
# Calculate and print statistics for margins of 1s
|
216 |
+
if margins_ones.numel() > 0: # Check if there are any predictions of 1
|
217 |
+
min_margin_ones = margins_ones.min().item()
|
218 |
+
max_margin_ones = margins_ones.max().item()
|
219 |
+
avg_margin_ones = margins_ones.mean().item()
|
220 |
+
|
221 |
+
print(f"Min Margin (Ones): {min_margin_ones:.2f}")
|
222 |
+
print(f"Max Margin (Ones): {max_margin_ones:.2f}")
|
223 |
+
print(f"Avg Margin (Ones): {avg_margin_ones:.2f}")
|
224 |
+
print("Margins (Ones):", margins_ones.flatten().numpy())
|
225 |
+
else:
|
226 |
+
print("No predictions of 1 in the test dataset.")
|
227 |
+
|
228 |
+
# Calculate and print statistics for margins of 0s
|
229 |
+
if margins_zeros.numel() > 0: # Check if there are any predictions of 0
|
230 |
+
|
231 |
+
min_margin_zeros = margins_zeros.min().item()
|
232 |
+
max_margin_zeros = margins_zeros.max().item()
|
233 |
+
avg_margin_zeros = margins_zeros.mean().item()
|
234 |
+
|
235 |
+
print(f"Min Margin (Zeros): {min_margin_zeros:.2f}")
|
236 |
+
print(f"Max Margin (Zeros): {max_margin_zeros:.2f}")
|
237 |
+
print(f"Avg Margin (Zeros): {avg_margin_zeros:.2f}")
|
238 |
+
print("Margins (Zeros):", margins_zeros.flatten().numpy())
|
239 |
+
|
240 |
+
else:
|
241 |
+
print("No predictions of 0 in the test dataset.")
|
242 |
+
######################## ADD EXPORT WORKING-MODEL WEIGHTS TO PARTIY-[hyperparameters]NN_weights.bin HERE ########################
|
243 |
+
|
244 |
+
######################## ADD A USER-INPUT (BINARY SEQUENCE) MODE HERE ########################
|