MartialTerran commited on
Commit
cc47b1c
·
verified ·
1 Parent(s): 5a9a434

Create PARITY-calculatingNN_Schmidhuber1.py

Browse files
Files changed (1) hide show
  1. PARITY-calculatingNN_Schmidhuber1.py +244 -0
PARITY-calculatingNN_Schmidhuber1.py ADDED
@@ -0,0 +1,244 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python script to compute parity of an N-bit binary number using an N-input neural network with L hidden layers, and one output neuron, in PyTorch.
2
+ # Customizable parameters:
3
+ N = 5 # Default number of input bits (and data bit-width)
4
+ L = 2 # Number of hidden layers
5
+ hidden_layer_size = 10 # Default number of neurons per hidden layer
6
+ epochs = 10000
7
+ learning_rate = 0.003
8
+ min_loss_threshold = 0.01 #(stops training epochs when loss reaches less than this number)
9
+
10
+ # In essence, parity is about the total count of 1s, while XOR is a convenient operation to compute the parity bit.
11
+
12
+
13
+ ########################
14
+
15
+ """
16
+ Python script to compute parity of an N-bit binary number using an N-input neural network with L hidden layers, and one output neuron, in PyTorch.
17
+
18
+ The Python script calculates even parity. The code that determines the parity is within the generate_data function:
19
+
20
+ parity = sum(bits) % 2 # Even parity
21
+
22
+ The sum(bits) calculates the number of 1s in the input bit sequence. The modulo operator % 2 returns 0 if the sum is even, and 1 if the sum is odd. This result is assigned to the parity variable. Because parity is set to 1 only when sum(bits) is odd, parity is '1' when number of bits set to 1 is odd, and therefore this is an even parity calculation. This explicitly calculates even parity: the parity bit is set so the total number of 1s (including the parity bit itself) in the sequence becomes even.[1]
23
+
24
+ Even parity is not about having an odd number of bits set to one; it's about ensuring that the total number of 1s (data bits + parity bit) is even. This is how error detection works with parity bits: the receiver knows they should always see an even number of 1s. If they encounter an odd number, it flags a transmission error.[1] The script itself generates random data with a variable bit-width N and then adds this parity bit to compute labels. This combined sequence (with a length N + 1) is never used in the script itself. It is the job of the Neural Network to discover the parity calculation on its own during the training phase, where only the input bits without the parity bit, and the calculated parity bit, are presented to the NN.
25
+
26
+ Includes modes for inference and pretraining with explicit gradient computations and backpropagation. Displays real-time error during training in a popup window.
27
+
28
+ Inspired by: [1] Aug 28, 2024 Youtube Interview of Juergen Schmidhuber Schmidhuber at youtube =DP454c1K_vQ
29
+ See also (seven years ago) True Artificial Intelligence will change everything | Juergen Schmidhuber | TEDxLakeComo www.youtube =-Y7PLaxXUrs
30
+
31
+ [Jürgen Schmidhuber, the father of generative AI shares his groundbreaking work in deep learning and artificial intelligence. In this exclusive interview, he discusses the history of AI, some of his contributions to the field, and his vision for the future of intelligent machines. Schmidhuber offers unique insights into the exponential growth of technology and the potential impact of AI on humanity and the universe.]
32
+
33
+ In this interview, Schmidhuber stated that LLMs cannot compute "parity" of bits in a binary number sequence, but that "Recurrent" NNs (RNNs) can compute "parity". I wanted to know whether a simple feed forward NNs can compute "parity". (If so, perhaps LLMs actually can compute "parity" if specifically trained to do so)
34
+
35
+ Method of the python script informed by:
36
+ Create a Basic Neural Network Model - Deep Learning with PyTorch 5 - YouTube =JHWqWIoac2I (2023-06-05)
37
+ and
38
+ Building a Neural Network with PyTorch in 15 Minutes youtube =mozBidd58VQ
39
+
40
+ Loss, as defined, drops smothly and with some bumps visible in the graph, depending on the "random" weights preloaded into the model each run.
41
+ Example results:
42
+ Epoch [1000/1000], Loss: 0.0395
43
+ Test Accuracy: 1.0000
44
+ [Once, model loss fell to 0.0642 after 2000 epochs and did not fall below .0637 after 9999 epochs. The final sample-tested prediction Accuracy: 0.9700]
45
+ This shows that the model typically does not need to run the loss all the way down to loss a less than .01 because the margins at the final neuron are earlier greater than .5 between zero/one logit values. The model script is not specifically written to optimize these margins. The margins just emerged.
46
+
47
+ done loading libraries
48
+ Epoch [100/1000], Loss: 0.6247
49
+ Epoch [200/1000], Loss: 0.4112
50
+ Epoch [300/1000], Loss: 0.1990
51
+ Epoch [400/1000], Loss: 0.0849
52
+ Epoch [500/1000], Loss: 0.0413
53
+ Epoch [600/1000], Loss: 0.0250
54
+ Epoch [700/1000], Loss: 0.0172
55
+ Epoch [800/1000], Loss: 0.0128
56
+ Epoch [900/1000], Loss: 0.0100
57
+ Epoch [1000/1000], Loss: 0.0081
58
+
59
+ Test Accuracy: 1.0000
60
+ Predictions tensor([1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0.,
61
+ 0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 0., 0.,
62
+ 0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1.,
63
+ 0., 1., 0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
64
+ 0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1.,
65
+ 0., 1., 0., 1., 0., 0., 0., 1., 0., 0.])
66
+ Labels tensor([1., 1., 1., 0., 0., 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0.,
67
+ 0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 0., 0.,
68
+ 0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 0., 0., 1.,
69
+ 0., 1., 0., 0., 1., 1., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
70
+ 0., 1., 0., 1., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1.,
71
+ 0., 1., 0., 1., 0., 0., 0., 1., 0., 0.])
72
+
73
+
74
+ The line that calculates test_outputs is:
75
+ test_outputs = model(test_data)
76
+
77
+ This line uses the model object (an instance of the ParityNet class) and calls its forward method. The code is equivalent to test_outputs = model.forward(test_data). In this line test_data represents a tensor with all the parity bit sequences to check.
78
+
79
+ The number of neurons and activations that directly contribute to generating test_outputs depends on the network architecture defined by N, L, and hidden_layer_size.
80
+
81
+ Output Layer: The final layer has one output neuron (because we are predicting a single binary value - the parity). This neuron uses a Sigmoid activation function.
82
+
83
+ Last Hidden Layer: This layer has hidden_layer_size neurons, each using a ReLU activation function. The output of each neuron in this layer feeds directly into the single output neuron.
84
+
85
+ Previous Hidden Layers: If L > 1, there are L-1 previous hidden layers, each also with hidden_layer_size neurons and ReLU activations. The activations of each layer feed into the next.
86
+
87
+ Input Layer: The input layer consists of N nodes which represent the input bits, and are directly connected to the first hidden layer. We might consider that a linear activation function is applied to such input layer.
88
+
89
+ So, to generate test_outputs, you have the following activations:
90
+
91
+ hidden_layer_size * L ReLU activations in the hidden layers.
92
+
93
+ 1 Sigmoid activation in the output layer.
94
+
95
+ N linear activations in the input layer (optional).
96
+
97
+ In summary, hidden_layer_size neurons and ReLU activations in the last hidden layer and 1 neuron with a Sigmoid activation in the output layer immediately generate the test_outputs values. All the other (L-1) * hidden_layer_size neurons with ReLU activations in the preceding hidden layers indirectly contribute by feeding into the last hidden layer. Each of the input bits are treated individually via the N nodes of the input layer, that might be considered having a linear activation or no activation at all.
98
+
99
+ """
100
+ ########################
101
+
102
+ print("load libraries")
103
+ print("import torch")
104
+ import torch
105
+ print("import torch.nn as nn")
106
+ import torch.nn as nn
107
+ print("import numpy as np")
108
+ import numpy as np
109
+ print("import matplotlib.pyplot as plt # For the popup error plot")
110
+ import matplotlib.pyplot as plt # For the popup error plot
111
+ print("import random")
112
+ import random
113
+ print("done loading libraries")
114
+
115
+ # Generate training data (parity bit is XOR or all data bits)
116
+ def generate_data(num_samples, num_bits):
117
+ data = []
118
+ labels = []
119
+ for _ in range(num_samples):
120
+ bits = [random.randint(0, 1) for _ in range(num_bits)]
121
+ parity = sum(bits) % 2 # Even parity: The modulo operator % 2 returns 0 if the sum is even, and 1 if the sum is odd.
122
+ data.append(bits)
123
+ labels.append(parity)
124
+
125
+ return torch.tensor(data, dtype=torch.float32), torch.tensor(labels, dtype=torch.float32).reshape(-1, 1)
126
+
127
+ train_data, train_labels = generate_data(1000, N) # 1000 N-bit numbers generated for training data.
128
+ test_data, test_labels = generate_data(100, N) # 100 N-bit numbers generated for test data.
129
+
130
+
131
+ # Define the neural network model
132
+ class ParityNet(nn.Module):
133
+ def __init__(self, input_size, hidden_size, num_hidden_layers, output_size):
134
+ super(ParityNet, self).__init__()
135
+ layers = []
136
+ layers.append(nn.Linear(input_size, hidden_size))
137
+ layers.append(nn.ReLU()) # Activation function
138
+ for _ in range(num_hidden_layers - 1):
139
+ layers.append(nn.Linear(hidden_size, hidden_size))
140
+ layers.append(nn.ReLU()) # Activation function
141
+ layers.append(nn.Linear(hidden_size, output_size))
142
+ layers.append(nn.Sigmoid()) # Output layer activation function for binary classification
143
+ self.layers = nn.Sequential(*layers)
144
+
145
+
146
+ def forward(self, x):
147
+ return self.layers(x)
148
+
149
+
150
+ # Create the model instance
151
+ model = ParityNet(N, hidden_layer_size, L, 1)
152
+
153
+ # Loss function and optimizer
154
+ criterion = nn.BCELoss() # Binary Cross Entropy Loss
155
+ optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
156
+
157
+ # Pretraining loop
158
+ losses = [] # Store losses for plotting
159
+ plt.ion() # Turn on interactive plotting
160
+ fig, ax = plt.subplots() # Create plot objects outside the loop
161
+
162
+ ######################## TRAINING ########################
163
+ for epoch in range(epochs):
164
+ # Forward pass
165
+ outputs = model(train_data)
166
+ loss = criterion(outputs, train_labels)
167
+
168
+ # Explicit gradient computation and backpropagation
169
+ optimizer.zero_grad()
170
+ loss.backward()
171
+ optimizer.step()
172
+
173
+
174
+ losses.append(loss.item())
175
+
176
+ ################ ADD: If loss.item() < (min_loss_threshold +0.01) Then, save the failed train_data values to failed_train_data[] and append to failed_train_data.txt Thus, before the end of training, grab the problem bit values and accumulate these difficult training cases in an external file. In future versions, use these hardest-to-train values to somehow extra-train the model. ################
177
+
178
+ if loss.item() < min_loss_threshold:
179
+ print(f"Reached minimum loss threshold of {min_loss_threshold} at epoch {epoch+1}. Stopping training.")
180
+ break # Exit the training loop
181
+
182
+ if (epoch + 1) % 100 == 0:
183
+ print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')
184
+
185
+ # Update the plot in real time
186
+ ax.clear() # Clear previous plot
187
+ ax.plot(losses) # plot loss history
188
+ ax.set_title("Training Loss")
189
+ ax.set_xlabel("Epoch (x 100)")
190
+ ax.set_ylabel("Loss")
191
+ plt.draw()
192
+ plt.pause(0.01) # Brief pause for plot to update
193
+
194
+
195
+ plt.ioff() # Turn off interactive mode so plot persists after training is done
196
+ plt.show() # show final plot.
197
+
198
+ ######################## INFERENCE-TESTING ########################
199
+ # Inference (Testing)
200
+ with torch.no_grad():
201
+ test_outputs = model(test_data)
202
+ predicted = (test_outputs > 0.5).float() # Convert probabilities to binary predictions (0 or 1)
203
+
204
+ accuracy = (predicted == test_labels).sum() / len(test_labels)
205
+ print(f'Test Accuracy: {accuracy:.4f}')
206
+ print("Predictions", predicted.flatten())
207
+
208
+ print("Labels ", test_labels.flatten())
209
+
210
+ # Separate margins for predictions of 1 and 0
211
+ margins_ones = test_outputs[predicted == 1] - 0.5
212
+ margins_zeros = 0.5 - test_outputs[predicted == 0]
213
+
214
+
215
+ # Calculate and print statistics for margins of 1s
216
+ if margins_ones.numel() > 0: # Check if there are any predictions of 1
217
+ min_margin_ones = margins_ones.min().item()
218
+ max_margin_ones = margins_ones.max().item()
219
+ avg_margin_ones = margins_ones.mean().item()
220
+
221
+ print(f"Min Margin (Ones): {min_margin_ones:.2f}")
222
+ print(f"Max Margin (Ones): {max_margin_ones:.2f}")
223
+ print(f"Avg Margin (Ones): {avg_margin_ones:.2f}")
224
+ print("Margins (Ones):", margins_ones.flatten().numpy())
225
+ else:
226
+ print("No predictions of 1 in the test dataset.")
227
+
228
+ # Calculate and print statistics for margins of 0s
229
+ if margins_zeros.numel() > 0: # Check if there are any predictions of 0
230
+
231
+ min_margin_zeros = margins_zeros.min().item()
232
+ max_margin_zeros = margins_zeros.max().item()
233
+ avg_margin_zeros = margins_zeros.mean().item()
234
+
235
+ print(f"Min Margin (Zeros): {min_margin_zeros:.2f}")
236
+ print(f"Max Margin (Zeros): {max_margin_zeros:.2f}")
237
+ print(f"Avg Margin (Zeros): {avg_margin_zeros:.2f}")
238
+ print("Margins (Zeros):", margins_zeros.flatten().numpy())
239
+
240
+ else:
241
+ print("No predictions of 0 in the test dataset.")
242
+ ######################## ADD EXPORT WORKING-MODEL WEIGHTS TO PARTIY-[hyperparameters]NN_weights.bin HERE ########################
243
+
244
+ ######################## ADD A USER-INPUT (BINARY SEQUENCE) MODE HERE ########################