## Probability of overlapping sub-sequences?

October 8th, 2011

Consider a periodic sequence, seqp, with period p

seqp = a1, a2, …, ap,  a1, a2, …, ap,  a1, a2, …, ap, …

Two sub-sequences, each of length N where 2< N< p, are produced by choosing two random start points in seqp and using the next N-1 numbers. What is the probability that these two sub-sequences will overlap?

If you take M>2 such length N sub-sequences then what is the probability that any two will overlap?

For a more concrete example, assume that p = 2^19937 – 1 (yes, a very big number) and consider 10,000 sub-sequences each of length 10^9.

Disclaimer: I don’t have the solution to this and haven’t yet tried to find it