Randomness, Birthdays, Pigeons and Twisters

What is Random? with Pseudo-Random Numbers

AzerKaanDasdemir
UX Collective

--

Calendar with PE printed on it
Photo by ready made from Pexels

Pick three numbers from 1 to 15. You are most likely to generate along the lines of 9, 3, 12. Present your numbers to a colleague now and you will mutually agree that it looks random enough. As a matter of fact, neither you nor your colleague is good at picking out or generating randomness. You have immediately came up with rules such as not picking the same number although this was never implied in the premise. When asked, almost no one is going to pick the same number between the given boundaries. You could have generated 9,9,9. But it doesn’t seem random enough. When you avoid this behavior intrinsically, you are already introducing a pattern. Which means you are steered away from randomness already. There should be no patterns, not even subtle ones to make the sequences random.

It seems like randomness is reserved for the use of nature and not its productions. Evolution, mutation, diversity, and mutants that are better adapted are all outputs of this random mechanism.

Scientists and academics actively seeking out these mechanisms and investigate them. They create models of random events, most often; on a computer. Which is made by humans, who are already very bad at generating randomness. Computers are very good at printing reliable outputs because they essentially run on precise mathematical constants. In theory, you could program a computer to be random; but what is random?

At this point without delving into quantum mechanics and subatomic particles, it is best to stick with what we have already that passes as random and is considered as close to random as possible. The randomness that we employ in slot Machines, in our programming practices.

It is your Birthday, Problem

Python has a library called the random.py. It is supposed to generate a random float number between 0 and 1 in its default. It can also be utilized to print a random item from a string.

>>>import random
>>>random.randint(1, 15)
13
>>>

or you can let it run loose between 0 and 1

>>> import random
>>> random.random()
0.4356670321428783
>>>

They both look random enough. How does python manage this?

Python random module uses a generator, called a pseudo-random generator. Allen B Downey in his paper aptly named ‘’Generating Pseudo-random Floating-Point Values’’; describes that;

…to generate uniformly-distributed pseudo-random floating-point values in the range [0, 1], assuming that we are given an algorithm for generating pseudo-random bits. These bits should have a probability of 50% of being in each of two states, and be statistically independent. Our approach is to choose floating-point values in the range such that the probability that a given value is chosen is proportional to the distance between it and its two neighbors.

Can we test if it really is random?

https://xkcd.com/221/

You can run PRNG via Python or anything for that matter, you will realize there is going to be a lot of duplicates; more often than you would expect.

The Birthday Problem simply means you will get more duplicates for a given set of values when you run a random number generator. You would expect duplicates of course but this effect comes into play when the likelihood and the frequency of recurring values are already statistically higher than you would expect. It is about our organic random-checker mechanism, our cognitive brain; falling short on judging the randomness of value sets.

It is called ‘’Birthday Problem’’ because, in a group of people, some are inevitably going to have the same birthdays. Thanks to another explicably named phenomenon, Pigeonhole Principle; the probability of people with shared birthdays reaches 100% when our group is expanded to 366 (365 days in a year). Here is the part where our brain denies due to cognitive bias:

This probability reaches about 99% percent with only 57 people and there is a 50% chance with only 23 people. This, of course, is being exploited already and utilized in cryptographic attacks (Also named Birthday attacks).

Pigeonholes

Let us start with an experiment. Pick five cards from a deck (which usually has 52 cards) thanks to the pigeonhole principle since each card can belong to one of four (Hearts, Spades, Diamonds, Clubs) two or more must belong to the same suit. So this translates that for a given number of “boxes” (pigeonholes) and “input” (pigeons) if the average of these values ends up more than “1”, it is inevitable to have more than 1 input for a given box.

Circling back to our card experiment :

5 cards (pigeons) / 4 suits (pigeonholes)

or in other words :

If n objects are distributed among k > 0 boxes, then at least one box contains at most N/k objects.

Mersenne Twister

Python, for its PRNG uses the Mersenne Twister algorithm. It is in fact very popular for non-security purposes. It can produce a 32-bit output. You can generate a lot of numbers before the sequence starts repeating itself. Again, with a generously large sample, it is easy to start hitting the boundaries of a PRNG. That is one of the reasons rendering it unfeasible for security reasons.
At this point, if you absolutely would like to reduce the chances of duplicates and not dive into any birthday issues just utilize random.shuffle :

>>> import random
>>> a = [1, 2, 3, 4, 5]
>>> random.shuffle(a)
>>> print(a)

This will take a list and shuffle it randomly. You can first randomize a sample set and then shuffle it to steer away from dupes. This is especially useful for presenting data sets as random to uneducated eyes and avoid triggering cognitive bias.

Unit Testing

Last but not least, we can use “random” effectively in unit testing.

  • The overall code we will be using to generate values over the problem domain is going to be more readable and simple.
  • Test outputs being irregularly allocated; helps avoid singularity.
def func_unit_test (k, p):return result

assume k and p are floats ranging from kmin & kmax , pmin & pmax respectively.

Instead of using random module alone, we can use seed to make sure the random outputs of this test differs from other threads that may present. By utilizing Mersenne Twister Algorithm and providing a seed or random module; we are effectively creating our own randomizer for this specific test event. If not seeded, it is always seeded to 1 by default.

random = Random((seed,m))for i in xrange(m):yield random.random(kmin,kmax), random.random(pmin,pmax)

This will also ensure that you will get the same outputs generated for a specifically given seed.

Random module is undeniably useful on numerous domains not limited to testing, gaming or cryptography. You can effectively generate statistical projections, sort through scientific data and make meaningful analysis by employing it correctly.

For the closure argument, I would like to quote from John Von Neumann who is also been quoted by Donald Knuth in his opus magnum “The Art of Computer Programming”.

“Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin”.

The UX Collective donates US$1 for each article published on our platform. This story contributed to Bay Area Black Designers: a professional development community for Black people who are digital designers and researchers in the San Francisco Bay Area. By joining together in community, members share inspiration, connection, peer mentorship, professional development, resources, feedback, support, and resilience. Silence against systemic racism is not an option. Build the design community you believe in.

--

--

Passionate about, Creative Writing & Fiction, UI, Software Philosophy. Enthusiastic about Flutter, TLA+, Python, Decentralized systems. / azerkaandasdemir.com