Searching for a counterexample to the Kurepa’s left factorial hypothesis (p < 10^9) [OLD]

[update (09/03/14): Vladica Andrejic and I wrote a paper regarding this problem and extended the search to 2^34 »]

Kurepa’s left factorial hypothesis states that $!p \not\equiv 0 \pmod p$, for all $p > 2$, where $!p = 0! + 1! + \ldots + (p-1)!$. This conjecture is also listed as a problem B44 in Guy’s “Unsolved problems in number theory”.

Official proof was published in 2004 by D. Barsky and B. Benzaghou “Nombres de Bell et somme de factorielles”. [update (12/12/12): This ‘proof’ turned out to be false »]

Vladica Andrejic (University of Belgrade) was suspicious that proof isn’t valid and conjectured that there are good chances that counterexample lies in the interval $(10^{10}, 10^{27})$. Using the existing hardware, it isn’t possible to check all values from this interval, so I decided to extend the search beyond $2^{27}$, which was covered in the latest attempt by P. Jobling.

Unfortunately, I didn’t find any counterexamples, but $2$ new values for $\mid r\mid < 10$ were found.

Some previous searches:

My results for $\mid r\mid < 100$ and $1.44 \cdot 10^{8} < p < 10^{9}$

 $p$ $!p \mod p$ $145946963$ $-49$ $171707099$ $52$ $301289203$ $-57$ $309016481$ $92$ $309303529$ $26$ $345002117$ $-5$ $348245083$ $-83$ $353077883$ $6$ $441778013$ $-27$ $473562253$ $25$ $499509403$ $-74$ $530339209$ $-24$ $594153589$ $14$ $653214853$ $78$ $709692847$ $-38$ $758909887$ $-85$
Several optimization tricks were used in order to reduce the number of required multiplications. This implementation required only ~1.8 multiplications per iteration. Also, a big advantage of modern CPUs is ability to multiply two 64bit registers without overflow. Four values of p were processed in the same loop in order to keep CPU-core busy. The program was written in asm/C, and ran at i7 quad-core CPU.