The Rabin-Karp Algorithm Explained

The Rabin-Karp algorithm is a string matching/searching algorithm developed by Michael O. Rabin and Richard M. Karp. It uses hashing technique and brute force for comparison, and is a good candidate for plagiarism detection.

Important terms

pattern is the string to be searched. Consider length of pattern as M characters.
text is the whole text from which the pattern is to be searched. Consider length of text as N characters.

What is brute force comparison?

In brute force comparison each character of pattern is compared with each character of text until characters that don't match are found.

How the Rabin-Karp Algorithm Works

Calculate hash value of pattern
Calculate hash value of first M characters of text
Compare both hash values
If they are unequal, calculate hash value for next M characters of text and compare again.
If they are equal, perform a brute force comparison.

hash_p = hash value of pattern
hash_t = hash value of first M letters in body of text
do
    if (hash_p == hash_t) 
        brute force comparison of pattern and selected section of text
    hash_t= hash value of next section of text, one character over
while (end of text or brute force comparison == true)

Advantage over Naive String Matching Algorithm

This technique results in only one comparison per text sub-sequence and brute force is only required when the hash values match.