Sooraj Parakkattil
6 min readFeb 21, 2021

--

Understanding Python’s Random Library

Photo by Markus Spiske on Unsplash

One of the unsolved mysteries of Computer science is the ability to achieve absolute Chaos or true Randomness. The random module in python provides a fast pseudorandom result. Its based on the Mersenne Twister Algorithm. It was supposed to be developed to provide inputs for the Monte Carlo Simulations, Generates numbers with nearly uniform distribution and large period, This makes its applications wide.

When a computer generates a “random” number, it goes through certain algorithms that will allow it to come up with that number, which means it wasn’t really random after all. “Pure randomness” can only be achieved if a computer measures a phenomenon taking place outside of its system such as in natural occurrences, from which a computer can get a “true random number.”

Generation

random() returns a random floating-point value from a generated sequence.

The value returned will always be 0 ≤ n < 1

import randomfor i in range(0, 2):
print(random.random())

the result will be different in each cycle/run

trial 1

0.14018841324945175
0.9278818508587169

trial 2

0.12963094776229056
0.22415885475998498

To generate numbers in a specific range use uniform()

We need to pass Min and Max values in uniform. It adjusts the return value using a formula →

min(max -min) * random

def check_uniform():
for i in range(0, 6):
print(random.uniform(20, 100))
#output
39.533816050191916
59.07628889880668
28.95493265586061
27.86933275723098
50.86356311876063
32.67746414800723

Seeding

random() produces the same result or value again after only after a long period. we might need to disable the repetitions. A way to achieve this is by writing another function to check each instances and also to save the result each time. This is a very unpractical if the data is very large.

Random includes a seed() function to initialise the generator so that it will process the same set.

The seed value controls the first value produced by the formula, which is used to generate random numbers. Since the formula is deterministic it also sets the full sequence seed.

The argument of seed() can be any hashable object. If system defines a source of randomness it is used, else current time is set as default.

def seed():
random.seed(5)
out = []
for i in range(0, 6):
out.append(random.random())
print(out)

trial 1

[0.6229016948897019, 0.7417869892607294, 0.7951935655656966, 0.9424502837770503, 0.7398985747399307, 0.922324996665417]

trial 2

[0.6229016948897019, 0.7417869892607294, 0.7951935655656966, 0.9424502837770503, 0.7398985747399307, 0.922324996665417]

as you can see as the seed in both tries are the same the resulting random values are also same.

Saving State

The state of the random generator used by random() can be saved and used again to control subsequent runs. Then we can restore it to previous state. The getstate() returns data that can be used to reinitialise it later using setstate()

Random Integers

we saw how random() generates floating point values. it is possible to convert them to integers. but the easiest way would be to randint() to generate the integer value directly.

def rand_int():
out = []
for i in range(0, 5):
out.append(random.randint(1, 100))
print(out)

out : [83, 18, 14, 4, 21]

we need to provide the range of integers as params. it can be +ve or -ve . but first value should be less than the second.

randrange() is a bit more general form, This function also supports step argument. its much like range(start, stop, step), this is more efficient because the whole range is not constructed.

def rand_range():
out = []
for i in range(0, 5):
out.append(random.randrange(0, 100, 5))
print(out)

out : [25, 45, 0, 50, 60]

Picking Random Items

one of the best application is to choose an item at random from a sequence provided. choice() function is used to select at random.

let’s toss coin 10000 times ?

def choice_toss_coins():
outcomes = {
'head' : 0,
'tail': 1
}
sides = list(outcomes.keys())
for i in range(10000):
outcomes[random.choice(sides)] += 1
print(outcomes)

out : {

‘head’: 5034,

‘tail’: 4967

}

Permutations

Think about of you are assigning seat positions randomly for 10 individuals or more, Using choice() will result in a possibility of the same seat assigned twice. It will be a disater if we implement sucha system in a ticketing system. Hence we need a system to implement that. Thats where we use shuffle(). after shuffle() we can remove or pop a value and this removing possibility of repetition.

def shuffle_seats():
seats = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
for i in range(10):
random.shuffle(seats)
print(f'Unallocated seats : {seats}')
print(f'seat for {i} is : {seats.pop()}')

out :

Unallocated seats : [8, 3, 4, 5, 10, 9, 6, 7, 1, 2]
seat for 0 is : 2
Unallocated seats : [9, 8, 6, 7, 5, 3, 1, 10, 4]
seat for 1 is : 4
Unallocated seats : [9, 6, 5, 3, 10, 7, 8, 1]
seat for 2 is : 1
Unallocated seats : [9, 6, 3, 8, 7, 10, 5]
seat for 3 is : 5
Unallocated seats : [9, 3, 10, 8, 6, 7]
seat for 4 is : 7
Unallocated seats : [9, 6, 3, 10, 8]
seat for 5 is : 8
Unallocated seats : [10, 6, 9, 3]
seat for 6 is : 3
Unallocated seats : [10, 6, 9]
seat for 7 is : 9
Unallocated seats : [10, 6]
seat for 8 is : 6
Unallocated seats : [10]
seat for 9 is : 10

Sampling

In many situations we need a sample from a group of data or a large dataset. sample() generates samples without repeating values and without any modifications to the input or input sequence.

The algorithm takes into account the size of input and sample to produce the random retrieval efficiently as possible.

lets see in a big company of 10000 employees an audit is going on the performance of employees. If the data is available we could just just call shuffle to get the required samples for the audit. Its is not possible to audit all of 10000 employees

SystemRandom

some OS provide random number generators. These will have access to more resources for achiving more entropy which can be introduced to teh generator. we can use SystemRandom class. It has same API as random but uses os.urandom() to generate values thats dependent on the algorithm.

def systemrandom():
print("Initialize")
r1 = random.SystemRandom()
r2 = random.SystemRandom()
for i in range(3):
print(r1.random(), r2.random())
print("same Seed")
seed = 512
r1 = random.SystemRandom(seed)
r2 = random.SystemRandom(seed)
for i in range(3):
print(r1.random(), r2.random())

out:

Initialize
0.7523357670573114 0.03522564078501078
0.8414854568300898 0.8369739532180619
0.520680350066864 0.7750489506980248
same Seed
0.876654185758468 0.5240599834986589
0.30129177015803155 0.9240994007176061
0.8379247607581471 0.728167158789879

sequences produces are not reproduced because randomness is coming from system rather than the state of the software ie: seed() and sestate() have no effect.

Non-uniform Distributions

The uniform distributions produced by random() is mostly useful, There are other models that will accurately suit other situations.

Normal

Used for non-uniform continuous valkues.

eg: grades, height, weight.

The results when plotted usually have a distinctive shape of a bell ie bell curve. Two functions used fore generating such values are normalvariate() and guass()

also logarithm-of values can be distributed normally if we use logonormvariate()

Approximation

triangular() is used to approximate distribution for small sample sizes. we can give low points and maximum.

Exponential

expovariate() produces a exponential distribution useful in simulating arrival or interval time values for homogeneous poisson activities such as rate of radioactive decay or requests coming into a server. paretovariate() useful for allocation of resources for individuals( wealth distribution, demand for professions, etc)

Angular

vonmisesvariate() is used to compute probabilities of cyclic activities. ie: angles, calendar days, time etc.

Sizes

betavariate() generates values with beta distribution. commonly used in Bayesian Stats and task-duration modelling.

Gamma distribution is produced with gammavariate(). eg: waiting times, rainfall amount, computational errors

weibullvariate() used for failure ananlysis, weather forcasting. also in distributiuon of sizes of particles or other objects.

Related reading :

https://docs.python.org/3/library/random.html

https://futurism.com/scientists-enable-computers-to-generate-true-random-numbers

https://en.wikipedia.org/wiki/Mersenne_Twister

The python 3 standard library by examples, Dough Hellmann

--

--

Sooraj Parakkattil

All things product. Web, python and dipping my toes into machine learning.