Method and system for adaptive rule-based content scanners

US 20050108554A1
Filed: 08/30/2004
Published: 05/19/2005
Est. Priority Date: 11/06/1997
Status: Active Grant

First Claim

Patent Images

1. A method for scanning content, comprising:

identifying tokens within an incoming byte stream, the tokens being lexical constructs for a specific language;

identifying patterns of tokens;

generating a parse tree from the identified patterns of tokens; and

identifying the presence of potential exploits within the parse tree, wherein said identifying tokens, identifying patterns of tokens, and identifying the presence of potential exploits are based upon a set of rules for the specific language.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for scanning content, including identifying tokens within an incoming byte stream, the tokens being lexical constructs for a specific language, identifying patterns of tokens, generating a parse tree from the identified patterns of tokens, and identifying the presence of potential exploits within the parse tree, wherein said identifying tokens, identifying patterns of tokens, and identifying the presence of potential exploits are based upon a set of rules for the specific language. A system and a computer readable storage medium are also described and claimed.

181 Citations

43 Claims

1. A method for scanning content, comprising:
- identifying tokens within an incoming byte stream, the tokens being lexical constructs for a specific language;
  
  identifying patterns of tokens;
  
  generating a parse tree from the identified patterns of tokens; and
  
  identifying the presence of potential exploits within the parse tree, wherein said identifying tokens, identifying patterns of tokens, and identifying the presence of potential exploits are based upon a set of rules for the specific language.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1 further comprising converting the incoming byte stream to a reduced set of character codes.
  - 3. The method of claim 1 wherein further comprising decoding character sequences according to an escape encoding.
  - 4. The method of claim 1 wherein said generating a parse tree is based upon a shift-and-reduce algorithm.
  - 5. The method of claim 1 wherein the set of rules expresses exploits in terms of patterns of tokens.
  - 6. The method of claim 1 wherein the set of rules includes actions to be performed when corresponding patterns are matched.
  - 7. The method of claim 1 wherein the specific language is JavaScript.
  - 8. The method of claim 1 wherein the specific language is Visual Basic VBScript.
  - 9. The method of claim 1 wherein the specific language is HTML.
  - 10. The method of claim 1 wherein the specific language is Uniform Resource Identifier (URI).
  - 11. The method of claim 1 for scanning a first type of content that has a second type of content embedded therewithin, further comprising recursively invoking another method in accordance with claim 1, for scanning the second type of content.

12. A system for scanning content, comprising:
- a tokenizer for identifying tokens within an incoming byte stream, the tokens being lexical constructs for a specific language;
  
  a parser operatively coupled to said tokenizer for identifying patterns of tokens, and generating a parse tree therefrom; and
  
  an analyzer operatively coupled to said parser for analyzing the parse tree and identifying the presence of potential exploits therewithin, wherein said tokenizer, said parser and said analyzer use a set of rules for the specific language to identify tokens, patterns and potential exploits, respectively.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 13. The system of claim 12 further comprising a pre-scanner for identifying content that is innocuous.
  - 14. The system of claim 12 wherein said tokenizer comprises a normalizer for converting the incoming byte stream to a reduced set of character codes.
  - 15. The system of claim 12 wherein said tokenizer comprises a decoder for decoding character sequences according to an escape encoding.
  - 16. The system of claim 12 wherein said parser generates the parse tree using a shift-and-reduce algorithm.
  - 17. The system of claim 12 further comprising a pattern-matching engine operatively coupled to said parser and to said analyzer, for matching a pattern within a sequence of tokens.
  - 18. The system of claim 17 wherein the pattern is represented as a finite-state machine.
  - 19. The system of claim 17 wherein the pattern is represented as a pattern expression tree.
  - 20. The system of claim 17 wherein patterns are merged into a single deterministic finite automaton (DFA).
  - 21. The system of claim 12 wherein the set of rules expresses exploits in terms of patterns of tokens.
  - 22. The system of claim 12 wherein the set of rules includes actions to be performed when corresponding patterns are matched.
  - 23. The system of claim 22 further comprising a scripting engine for implementing the actions to be performed.
  - 24. The system of claim 12 wherein the specific language is JavaScript.
  - 25. The system of claim 12 wherein the specific language is Visual Basic script.
  - 26. The system of claim 12 wherein the specific language is HTML.
  - 27. The system of claim 12 wherein the specific language is Uniform Resource Identifier (URI).

28. A computer-readable storage medium storing program code for causing a computer to perform the steps of:
- identifying tokens within an incoming byte stream, the tokens being lexical constructs for a specific language;
  
  identifying patterns of tokens;
  
  generating a parse tree from the identified patterns of tokens; and
  
  identifying the presence of potential exploits within the parse tree, wherein said identifying tokens, identifying patters of tokens, and identifying the presence of potential exploits are based upon a set of rules for the specific language.

29. A method for scanning content, comprising:
- expressing an exploit in terms of patterns of tokens and rules, where tokens are lexical constructs of a specific programming language, and rules are sequences of tokens that form programmatical constructs; and
  
  parsing an incoming byte source to determine if an exploit is present therewithin, based on said expressing.
- View Dependent Claims (30, 31, 32, 33, 34, 35)
- - 30. The method of claim 29 further comprising generating a parse tree for the incoming byte source, the nodes of the parse tree corresponding to tokens and rules.
  - 31. The method of claim 30 wherein nodes of the parse tree corresponding to rules are positioned as parent nodes, the children of which correspond to the sequences of tokens that correspond to the rules.
  - 32. The method of claim 31 wherein a new parent node is added to the parse tree if a rule is matched.
  - 33. The method of claim 32 wherein said parsing determines if an exploit is present within the incoming byte source when a new parent node is added to the parse tree.
  - 34. The method of claim 33 wherein tokens and rules have names associated therewith, and further comprising assigning values to nodes in the parse tree, the value of a node corresponding to a token being the name of the corresponding token, and the value of a node corresponding to a rule being the name of the corresponding rule.
  - 35. The method of claim 34 further comprising storing an indicator for the matched rule in the new parent node of the parse tree, if said parsing determines the presence of the matched rule.

36. A system for scanning content, comprising:
- a parser for parsing an incoming byte source to determine if an exploit is present therewithin, based on a formal description of the exploit expressed in terms of patterns of tokens and rules, where tokens are lexical constructs of a specific programming language, and rules are sequences of tokens that form programmatical constructs.
- View Dependent Claims (37, 38, 39, 40, 41, 42)
- - 37. The system of claim 36 wherein said parser comprises a tree generator for generating a parse tree for the incoming byte source, the nodes of the parse tree corresponding to tokens and rules.
  - 38. The system of claim 37 wherein nodes of the parse tree corresponding to rules are positioned as parent nodes, the children of which correspond to the sequences of tokens that correspond to the rules.
  - 39. The system of claim 38 wherein said tree generated adds a new parent node to the parse tree if a rule is matched.
  - 40. The system of claim 39 wherein said parser determines if a matched rule is present within the incoming byte source when said tree generator adds a new parent node to the parse tree.
  - 41. The system of claim 40 wherein tokens and rules have names associated therewith, and wherein said tree generator assigns value to nodes in the parse tree, the value of a node corresponding to a token being the name of the corresponding token, and the value of a node corresponding to a rule being the name of the corresponding rule.
  - 42. The system of claim 41 wherein said tree generator stores an indicator for the matched rule in the new parent node of the parse tree, if said parser determines the presence of the matched rule.

43. A computer-readable storage medium storing program code for causing a computer to perform the steps of:
- expressing an exploit in terms of patterns of tokens and rules, where tokens are lexical constructs of a specific programming language, and rules are sequences of tokens that form programmatical constructs; and
  
  parsing an incoming byte source to determine if an exploit is present therewithin, based on said expressing.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Finjan Holdings, Inc. (SoftBank Group Corp.)
Original Assignee
Finjan, Inc. (SoftBank Group Corp.)
Inventors
Shaked, Amit, Yermakov, Alexander, Touboul, Shlomo, Rubin, Moshe, Matitya, Moshe, Melnick, Artem

Granted Patent

US 8,225,408 B2
Time in Patent Office

Days
Field of Search
US Class Current

713/187
CPC Class Codes

G06F 21/562   Static detection

G06F 21/563   by source code analysis

G06F 2221/2119   Authenticating web pages, e...

G06F 8/427   Parsing

H04L 63/0227   Filtering policies mail mes...

H04L 63/0245   Filtering by information in...

H04L 63/145   the attack involving the pr...

H04L 63/168   above the transport layer

Method and system for adaptive rule-based content scanners

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

181 Citations

43 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for adaptive rule-based content scanners

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

181 Citations

43 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links