Selection with Replacement
Had a problem when testing something that did random selection from a large pool that was showing more duplicates than expected. When sampling out of 500, we were seeing duplicates very frequently. Turns out it was the birthday problem - in a group of 23 people, there is an even chance that two or more people will share a birthday.
Amazing part of this problem is that even doubling our sample to 1000, after just 40 tries there is a better than even chance that we will have a duplicate.
Formula is as follows, for a sample size of n, with m selected,
percentage chance of duplicate is n!/((n-m)! * n power m)
Code below shows this using irb, first off with the standard birthday numbers (just to check the formula) and then with the sample size of 1000
irb(main):001:0> class Integer irb(main):002:1> def factorial irb(main):003:2> (2..self).inject(1) { |f, n| f * n } irb(main):004:2> end irb(main):005:1> end => nil rb(main):006:1> (365.factorial * 100)/(342.factorial * 365**23) => 49 irb(main):007:0> (1000.factorial * 100)/(960.factorial * 1000**40) => 45