Tuesday 27 March 2012

Network Worm Detection


Worms are self propagating programs that spread by exploiting a latent software vulnerability in some popular network service- such as email, web or terminal access. A computer worm is a malware program that spreads to other computers by replicating itself. Most of the worms are designed to spread and not to affect the systems they pass through. Worms are mostly used for spamming purposes. A very common use of worms is to create zombie computer under the control of the worm author. Such computers are used by spam senders for sending junk email or to cloak their website's address.

I will describe a algorithm to detect such an attack. The basic approach of detection, characterization and containment has not changed over the years. First of all, new worms are detected by intrusion detection systems. Then, after isolating an instance of the worm, skilled security professionals manually characterize a worm signature and finally,this signature is used to contain subsequent infections via updates to anti-virus software. But this approach is insufficient. It requires isolating a new worm and then a skilled professional works on it to decode the sequence, looks for invariant code sequences and then tests the signatures  for uniqueness. But recent reports have shown that effective worm containment can require a reaction time of under 1 minute.

First observation about worms is that some portion of the content in existing worm is invariant. It is typically the code containing the latent host vulnerability. Second observation is that the spreading dynamics of a worm is atypical of Internet applications. Network worms tend to behave quite differently from the popular client-server and peer-to-peer applications deployed on today's networks.

These key behaviours can be exploited to detect and characterize network worms. In all existing worms, some of the program is invariant across every host it infects. Some worms make use of limited polymorphism by encrypting each worm instance independently and /or randomising filler text. In these cases, much of the worm body is variable, but key portions are still invariant(eg. the decryption routine). From these observations, we can conclude that network worms must generate significant traffic to spread and that this traffic will contain common sub string and that this traffic will be directed between a variety of network hosts.


In principle, detecting this traffic pattern is relatively straightforward. For each network packet, the content is extracted and all substrings processed. Each substring is indexed into a prevalence table that increments a count for a given substring each time it is  found. In effect, this table implements a histogram of all observed substrings. To maintain a count of unique source and destination  addresses, each table entry also maintains two lists, containing IP addresses, that are searched and potentially updated each time a  substring count is incremented. Sorting this table on the substring count and the size of the address lists will produce the set of likely  worm traffi c. Better still, the table entries meeting this worm behavior criteria are exactly those containing the invariant substrings of the worm. It is these sub- strings that can be used as signatures to alter the worm out of legitimate network traffic. Network content which is not prevalent or not widely dispersed is sifted out, leaving out the worm content.

This is a very high level view of the algorithm for worm detection which does not scale well.  But this can be a good starting point for understanding the practical worm detection algorithm.