Saturday 12 November 2011

Efficient way to find largest unique substring from a given string

The algorithm which I am proposing, takes O(n) time and O(1) space. The unique sub string in this problem is a sub string having all the characters distinct.

I am assuming that the string contains ASCII characters. ASCII includes definition of 128 characters. If we represent each ASCII character by 1 bit, we require 16 bytes or 4 integers to represent all.

ALGORITHM
Its very simple.  Take a hash table of 4 integers initialized to 0. Just keep on reading characters of the string and check whether that character is already in the current sub string by checking whether the corresponding bit in the hash table is set.
        If it is not set, you advance the end pointer of the sub string and then check whether the current unique sub string is larger than the maximum length unique sub string till now and accordingly update.
        If it is set, you advance the start pointer of the current sub string till the point that bit is unset and follow the loop.

CODE:
Here is the C code for it :

'start' and 'end' mark the current substring processed.
'mstart' and 'mend' mark the longest found substring till now
str[] is the given string


At the end, the sub string between mstart and mend is the required sub string.


No comments:

Post a Comment