COVFEFE

Mr. Trump decides to post a random message on Facebook and he starts typing a random sequence of letters $\{U_k\}_{k\geq1}$ such that they are chosen independently and uniformly from the 26 possible english alphabets. Find out the expected time of first appearance of the word COVFEFE.

This problem can be thought as an example of Markov chain.

For the sequence COVFEFE,

  • consider a sequence of states x, a, b, c, d, e, f
    • such that, state
      • x: starting from scratch
      • a: starting after C
      • b: starting after CO
      • c: starting after COV
        ...so on
        That is,
        .  C O V F E F E  
        .  x - - - - - -  
        .    a - - - - -  
        .      b - - - -  
        .        c - - -  
        .          d - -  
        .            e -  
        .              f

Above could be represented as a transition model of a Markov chain. Each edge in this model is probability of transitioning between states

.   <--          <-----------------------------------------------
.  |   |        |   |        |        |        |        |        |
.  |   |25/26   |   |1/26    |1/26    |1/26    |1/26    |1/26    |1/26   
.  |   |        |   |        |        |        |        |        |
.   -> x -----> a ---------> b -----> c -----> d -----> e -----> f -----> DONE    
       |  1/26  |   1/26     |  1/26  |  1/26  |  1/26  |  1/26  |  1/26
       |        |            |        |        |        |        |
.      |        |24/26       |24/26   |24/26   |24/26   |24/26   |24/26
.      |        |            |        |        |        |        |  
.      <---------------------------------------------------------

Looking at above, we can construct a transition table (matrix) as:

$ x = \frac{1}{26} * (1+a) + \frac{25}{26} * (1+x) \\ a = \frac{1}{26} * (1+a) + \frac{1}{26} * (1+b) + \frac{24}{26} * (1+x) \\ b = \frac{1}{26} * (1+c) + \frac{1}{26} * (1+a) + \frac{24}{26} * (1+x) \\ ...\\ f = \frac{1}{26}*(1) + \frac{1}{26}(1+a) + \frac{24}{26} * (1+x)\\ $

$ \begin{bmatrix} x\\ a\\ b\\ c\\ d\\ e\\ f\\ \end{bmatrix} = \frac{1}{26} * \begin{bmatrix} 25 & 1 & 0 & 0 & 0 & 0 & 0\\ 24 & 1 & 1 & 0 & 0 & 0 & 0\\ 24 & 1 & 0 & 1 & 0 & 0 & 0\\ 24 & 1 & 0 & 0 & 1 & 0 & 0\\ 24 & 1 & 0 & 0 & 0 & 1 & 0\\ 24 & 1 & 0 & 0 & 0 & 0 & 1\\ 24 & 1 & 0 & 0 & 0 & 0 & 0\\ \end{bmatrix} * \begin{bmatrix} x\\ a\\ b\\ c\\ d\\ e\\ f\\ \end{bmatrix} + \frac{1}{26} * \begin{bmatrix} 25 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\ 24 & 1 & 1 & 0 & 0 & 0 & 0 & 0\\ 24 & 1 & 0 & 1 & 0 & 0 & 0 & 0\\ 24 & 1 & 0 & 0 & 1 & 0 & 0 & 0\\ 24 & 1 & 0 & 0 & 0 & 1 & 0 & 0\\ 24 & 1 & 0 & 0 & 0 & 0 & 1 & 0\\ 24 & 1 & 0 & 0 & 0 & 0 & 0 & 1\\ \end{bmatrix} * \begin{bmatrix} 1\\ 1\\ 1\\ 1\\ 1\\ 1\\ 1\\ 1\\ \end{bmatrix} $

X =                         A.X         +             B  

=> (I-A).X = B
In [4]:
import numpy as np
A = 1./26 * np.array([
    [25, 1, 0, 0, 0, 0, 0],
    [24, 1, 1, 0, 0, 0, 0],
    [24, 1, 0, 1, 0, 0, 0],
    [24, 1, 0, 0, 1, 0, 0],
    [24, 1, 0, 0, 0, 1, 0],
    [24, 1, 0, 0, 0, 0, 1],
    [24, 1, 0, 0, 0, 0, 0]
])

B = np.hstack((A, 1./26*np.array([0,0,0,0,0,0,1]).reshape(7,1))).dot(np.ones(8))

I = np.eye(7)
print("Expected length of random string of letters {0:3.5g} billions".format(np.linalg.solve((I-A), B)[0]/1e9))
Expected length of random string of letters 8.0318 billions

Alternate Method. Calculus (approximate solution)

$E = \int_{x=1}^{\infty} x.a^{x-1}.b\ dx = \frac{b}{ln(a)}.(\frac{1}{ln(a)}-1)$,

  • where,
    • b: probability of correct letter selection at a position = $\frac{1}{26^7}$
    • a: probability of incorrect selection at a position (=1-b)
In [2]:
import numpy as np
pSel = lambda Npos, Nchar: (1./Nchar)**Npos # prob of selection when Number of char=Nchar and Npos are number of positions to fill
pNonSel = lambda Npos, Nchar : 1.-pSel(Npos, Nchar) # prob of not selecting a sequence at any given time
pNonSel_ForNtimes = lambda Ntimes, Npos, Nchar : (pNonSel(Npos, Nchar))**Ntimes

Npos, Nchar = 7, 26
print(pSel(Npos, Nchar) * 1./np.log(pNonSel(Npos, Nchar)) * (1./np.log(pNonSel(Npos, Nchar)) - 1.) / 1e9, 'billions')
8.03180664333 billions