I had cause, recently, to write a little function that randomises a name for a resource from a list of roughly 90 words and a 3digit postfix, in the form ‘happycat267′*. Now, I don’t want to reuse a name that’s gone before, so the function checks alreadydeployed resources and if the name is taken, the function recurses to pick a new name from my ~90,000 possiblities.
Of course, when it came to testing this function, I wanted to be reasonably sure that the recursion worked and would continue to work in the future, so I decided to write a simple Pester test that loops a certain number of times and doesn’t return a duplicate value.
But this raised a question. How many iterations should this test go through to be reasonably sure of hitting a duplicate and thus triggering a recursion?
Obviously to fully test, I’d need to run the function 90,000+ times. But that’s computationally expensive and would slow down my testing. I don’t want to do that. But how many is the least number of tests I can do to be reasonably sure of a duplicate appearing?
Which brings me to the Birthday Problem, aka the Birthday Paradox.
Stated simply, the Birthday Problem tells us that in a group of only 23 people, it’s more likely than not to find two people who share a birthday. And if you have 70 people in a room, the probability of at least one birthday match is up at 99.9%.
The Wikipedia article goes into a lot of detail on why this is mathematically true, but the astute among us whill have noticed that this mathematical phenomenon has an application in unit testing my little “randomnamewithoutcollisions” function.
There are Birthday Problem calculators online, such as this one, so I plugged in my possibility space of 90,000 and started playing with iterations. It turns out that if I call my function 1000 times, there’s a >99.5% chance that I’ll produce (and therefore handle) a duplicate.
At 500 iterations, the probability drops to around 75%, and at 300 iterations, it’s around 40% – so clearly 300 iterations is too low, that is it’s more likely that I won’t hit a duplicate.
So, I wrote my own function to calculate the probabilities for me
I can call this in a loop, and calculate a table of probabilities from 1 iteration to 10,000 (and graph that, if I feel like it).

(1..10000)  % { $probability = GetCollisionChance 90000 $_ [pscustomobject]@{Iterations = $_ ; Probability = $probability} } 
I can then use that table to zoom in on a probability I feel comfortable with. Let’s say I’m happy with a 99% probability of collision, it turns out that 911 iterations will get me there. If I’m happy with 90%, 644 iterations will do it.
Above 911 iterations, the curve plateaus out and the returns from adding more iterations become smaller. We hit a point where PowerShell rounds the probability up to 1 at 2519 iterations. It’s not mathematically certain at this point, but we’re up around a 99.9999999999999% chance of collision.
So we can see there’s really not much point iterating above 2500 repetitions. The increased probability just isn’t worth the extra processor cycles.
So anyway, with a little mocking, I can write a useful test that has a 99% chance of hitting at least one duplicate in the function, and test that it doesn’t actualy return any dupes, thus:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Describe "Getting a random name" { # Mock this so we don't exhaust API calls by repeatedly calling for a list $currentresources = GetAlreadyDeployedResources Mock GetAlreadyDeployedResources { return $currentresources } It "Never returns duplicates (as far as we can tell)" { $reslist = @() (1..911)  % { # get a random resource name $reslist += GetRandomResourceName  teeobject Variable plusone # add that back to the list we mocked $currentresources += [pscustomobject]@{ ResourceName = $plusone; ResourceID = 'fakeidentifier' } } # count how many names we have $all = ($reslist  measureobject  selectobject expand Count) #check how many unique names are in that list $unique = ($reslist  selectobject unique  measureobject  selectobject expand Count) # they should match $all  Should Be $unique } } 
Anyway, this is what I spent yesterday afternoon researching and playing around with. Hopefully someone other than me will find it useful. If not, well I had fun.
* not one of the actual values.