import random # broken when this model is running on GPU
#6
by
darkmatter2222
- opened
Crazy, I have some python random implementations surrounding the use of this model. Random will consistently predict the same values if this model is running. switching to numpy random fixed it:
import numpy as np # random is broken from the granite model
random = np.random.default_rng()
I'm sure all you nerds want code examples, TBH, I don't have time for anything more than this report: see repo and my ChatGPT;
https://github.com/darkmatter2222/NLU-Redact-PII
The behavior you're observing is due to the fixed seed. When you call:
python
Copy
random.seed(42)
at the beginning, it sets the random generator to a fixed starting point. This means that every time you run your script (or in your loop, if you reinitialize the seed), you'll get the same sequence of random numbers.
To fix this:
Remove the fixed seed line if you want non-deterministic (truly random) results:
python
Copy
# Remove or comment out the following line:
# random.seed(42)
Or, if you want reproducibility across runs but still different numbers during one run, set the seed once outside the loop. (Note that if you seed with a constant before an infinite loop, you'll get a deterministic sequence within that run—but not the same number in every iteration; rather, you'll get the same sequence each time you run the script.)
If you need variability across runs, it's best to remove the fixed seed altogether, so Python uses the system time as the default seed.
For example, update your code to:
python
Copy
import random, time
# Comment out the fixed seed to allow for true randomness:
# random.seed(42)
# Import data generators and helper functions from your package.
from redact.data_generators import (
generate_people_name,
generate_card_number,
generate_account_number,
generate_ssn,
generate_government_id,
generate_dob,
generate_password,
generate_tax_id,
generate_phone_number,
generate_address,
generate_email_address,
generate_ip,
generate_passport,
generate_driver_license,
add_noise
)
#from redact.llama_integration import LlamaGenerator, validate_sentence
from redact.granite_intergration import GraniteGenerator, validate_sentence
from redact.synthetic_data import SyntheticDataWriter
# Map each field to its generator function.
generators = {
"People Name": generate_people_name,
"Card Number": generate_card_number,
"Account Number": generate_account_number,
"Social Security Number": generate_ssn,
"Government ID Number": generate_government_id,
"Date of Birth": generate_dob,
"Password": generate_password,
"Tax ID Number": generate_tax_id,
"Phone Number": generate_phone_number,
"Residential Address": generate_address,
"Email Address": generate_email_address,
"IP Number": generate_ip,
"Passport": generate_passport,
"Driver License": generate_driver_license
}
# ... rest of your code ...
By removing (or dynamically seeding) the fixed seed, your random calls (such as random.randint(1, 5) and random.sample(...)) will produce a different sequence of numbers on each run and iteration.
You said:
ya, so i did that and it still isnt random
Reasoned for 6 seconds
If you removed the explicit fixed seed (e.g. random.seed(42)) but still see the same sequence every time, here are a few things to check:
Ensure No Other Seeding:
Verify that there are no other calls to random.seed() in your code or in any of the imported modules (like in your data generators). Sometimes a helper module might set its own seed, which can override your expectations.
Check Your Environment:
If you’re running the code in an interactive session (such as in a Jupyter notebook or a long-running REPL), a seed may have been set earlier in that session. Restarting the kernel or interpreter can ensure that the random generator is seeded from the system time by default.
Explicitly Reseed with System Time:
If you want to be sure you’re not using a fixed seed, you can reseed with None at the start of your main function. For example:
python
Copy
import random
random.seed(None) # This seeds using system time or an OS-specific randomness source.
This makes sure that every run will start with a different seed.
Test Random Output in Isolation:
Insert a small snippet in your main loop to print out a random number (or two) independently. For example:
python
Copy
print("Random test:", random.randint(1, 100))
If this number changes on each iteration, then the random module is working as expected. If it doesn’t, then something in your environment is still forcing a fixed seed.
By ensuring that no code (including in imported modules) is re-seeding the random generator and by explicitly using random.seed(None) if needed, you should see varying outputs from random.randint() and random.sample() each time your loop runs.
You said:
that didnt work, i need another solution for generating random
Reasoned for 6 seconds
If you find that the standard random module isn’t giving you the expected variability—even after removing or resetting the seed—you can switch to a randomness source that doesn’t depend on the seed you set manually. One common solution is to use Python’s built-in cryptographic random generator via random.SystemRandom. This class uses operating system sources (such as os.urandom) to generate random numbers that are not affected by a fixed seed.
For example, in your code you can replace:
python
Copy
num_fields = random.randint(1, 5)
selected_fields = random.sample(list(generators.keys()), num_fields)
with:
python
Copy
import random
sys_random = random.SystemRandom()
num_fields = sys_random.randint(1, 5)
selected_fields = sys_random.sample(list(generators.keys()), num_fields)
This way, you’re using SystemRandom’s methods (randint and sample), which rely on a secure OS-level source of randomness and aren’t affected by any seed you set manually.
Give that a try and see if it produces the desired variability in your random number generation.
You said:
apparently using this model breaks import random so lets find a new approach
Reasoned for a few seconds
If the standard Python random module is causing issues with your current model, you can switch to using NumPy’s random functionality. For example, replace:
python
Copy
num_fields = random.randint(1, 5)
selected_fields = random.sample(list(generators.keys()), num_fields)
with:
python
Copy
import numpy as np
# Create a new random number generator instance.
rng = np.random.default_rng()
# Generate a random integer between 1 and 5 (inclusive).
num_fields = int(rng.integers(1, 6)) # Note: high is exclusive.
# Randomly select `num_fields` items from the generator keys without replacement.
selected_fields = list(rng.choice(list(generators.keys()), size=num_fields, replace=False))
Explanation
np.random.default_rng():
This creates a new instance of NumPy's random Generator, which is independent from Python's built-in random module.
rng.integers(1, 6):
Generates a random integer between 1 (inclusive) and 6 (exclusive), effectively giving you a value in [1, 5].
rng.choice(..., size=num_fields, replace=False):
Randomly selects num_fields unique items (without replacement) from the list of keys.
This approach uses NumPy’s robust random number generator and should provide the randomness you need without interfering with your model's imports.