Malicious hackers and IT defenses are in an arms race. Constant improvement in the speed and sophistication of deep-packet inspection is required as we move to 40Gbps and beyond. Signature-based detection systems need fast, reprogrammable hardware for complex signature analysis. Spotting ”zero day” attacks and polymorphic malware drives the need for novel and flexible systems.
The avalanche of attacks on computers and networks worldwide by hackers is widely recognized as a serious problem, and it is increasing at an ever faster rate. Browser hijackers, ransomware, keyloggers, backdoors, rootkits, Trojan horses, worms, spyware, denial of service attacks, and other variations are inflicted on us in an arms race increasingly fueled by sophisticated criminal groups (and some governments), with motivations ranging from financial theft to industrial espionage and sabotage.
A variety of defensive strategies are used at a variety of places in the network. Firewalls, intrusion-detection and prevention systems, anti-virus scanners and other solutions need constant upgrades to cope with the malware onslaught.
This top-10 virus chart illustrates how frequently “new” malware is simply a variant of familiar ones. The 2nd, 4th and 6th positions in the ranking are simply variants of Trojan-PSW.Win32.Tepfer. They steal cookies, passwords, and email details.
Signature-based detection has long been a mainstay defensive strategy. These methods look for known patterns of data in files attached to emails or in disk storage already, or inside inbound network packet payloads (deep-packet inspection). Although the signature-based approach can effectively contain known virus outbreaks, malware authors can stay a step ahead of simple signature searches by writing “oligomorphic,” “polymorphic,” or “metamorphic” viruses which modify and/or encrypt parts of their code over time to hide themselves from simple signature-based searches.
To counter these variants, heuristic searches or “generic signatures” are used to identify variants by looking for many variations of known malicious code patterns. Using more complex wildcarded regular expression techniques, the search can better detect new variant malware even if they are padded with extra, meaningless code, re-arranged, and/or partially encrypted. The compute load for looking at so many variations is much larger than simple pattern searches, especially as wired and wireless speeds are increasing so rapidly over time.
Following is an example of a complex security rule, taken from Snort, converted to an automaton that runs directly on the Automata Processor. This particular rule is designed to capture a buffer overﬂow attack on an Apache web server.
The test results shown in the table below were extracted from Snort and modified to increase the number of patterns including character classes, and to increase the percentage of patterns using unbounded repetitions of wildcards.
The SNORT rules were implemented in the Automata Processor SDK tool chain to quantify the following: the number of regular expressions in the dataset, the number of NFA states needed to implement the datasets, the number of state transition elements used after configuration of the chip by Micron's AP compiler, and the percentage of AP chip capacity consumed.
|Ruleset||Description||Num of RegEx||NFA States||STEs Used||% of chip used|
|EM||only exact match patterns||1k||28.7k||29.7k||78|
|Range 5||50% of patterns have char-classes||1k||28.5k||29.3k||78|
|Range 1||100% of patterns have char-classes||1k||29.6k||30.4k||80|
|Dot-star 0.5||5% of patterns have "||1k||29.1k||30.0k||77|
|Dotstar 1||10% "||1k||29.2k||30.0k||77|
|Dotstar 2||20% "||1k||28.7k||30.0k||77|
|Dotstar 3||30% "||1k||28.7k||4.5k||76|
|Ruleset||Throughput: 1 AP Device||Throughput: 1 Rank of 8 AP Devices||Throughput: 1 PCIe card (4 ranks)|
|Backdoor||1 Gbps||2 Gbps||8 Gbps|
|Spyware||1 Gbps||2 Gbps||8 Gbps|
|EM||1 Gbps||2 Gbps||8 Gbps|
|Range 5||1 Gbps||2 Gbps||8 Gbps|
|Range 1||1 Gbps||2 Gbps||8 Gbps|
|Dot-star 0.5||1 Gbps||2 Gbps||8 Gbps|
|Dot-star 0.1||1 Gbps||2 Gbps||8 Gbps|
|Dot-star 0.2||1 Gbps||2 Gbps||8 Gbps|
|Dot-star 0.3||1 Gbps||2 Gbps||8 Gbps|
The usage of state transition elements corresponds nearly 1-to-1 with the number of NFA states and that resource utilization does not grow with expression complexity. Thousands of rule sets will fit in a single Automata Processor chip and will compute results at exactly 1 Gbps per chip. A rank of 8 chips configured with 2 groups of the entire SNORT rule set would run at 2 Gbps. A PCIe card with 4 ranks would run at 8 Gbps and further scaling can be obtained by adding more cards to the system.
The immensely parallel Automata Processor (AP) technology can handle extremely complex, heavily wild-carded regular expression searches at unprecedented speeds. Extremely complex heuristic searches can be done to catch morphed malware patterns much more cost-effectively than with un-aided traditional CPUs. AP-based accelerator cards are also much more feasible to deploy in high-density server datacenters due to ultra-low power, just a few watts per device. (See bottom note for more detail.)