Method of partitioning data records

US 7,272,612 B2
Filed: 01/30/2004
Issued: 09/18/2007
Est. Priority Date: 09/28/1999
Status: Expired due to Term

- Alert
- Pin

Associated Cases

Associated Defendants

First Claim

Patent Images

1. A computer-implemented method of partitioning data records in a computer into groups, comprising the steps of:

(a) defining a function of a distribution of values of a designated variable associated with the data records, wherein the function comprises a combination of measures of entropy and adjacency, adjacency being weighted by a weighting factor;

(b) partitioning the values of the designated variable into two or more groups, wherein a value of the function is determined by applying an optimization procedure; and

(c) assigning a data record to a group according to the values of the designated variable.

View all claims

0 Assignments

Timeline View

Assignment View

Litigations

0 Petitions

Accused Products

Abstract

A tree-structured index to multidimensional data is created using occurring patterns and clusters within the data which permit efficient search and retrieval strategies in a database of DNA profiles. A search engine utilizes hierarchical decomposition of the database by identifying clusters of similar DNA profiles and maps to parallel computer architecture, allowing scale up past previously feasible limits. Key benefits of the new method are logarithmic scale up and parallelization. These benefits are achieved by identification and utilization of occurring patterns and clusters within stored data. The patterns and clusters enable the stored data to be partitioned into subsets of roughly equal size. The method can be applied recursively, resulting in a database tree that is balanced, meaning that all paths or branches through the tree have roughly the same length. The method achieves high performance by exploiting the natural structure of the data in a manner that maintains balanced trees. Implementation of the method maps to parallel computer architectures, allowing scale up to very large databases.

61 Citations

View as Search Results

23 Claims

1. A computer-implemented method of partitioning data records in a computer into groups, comprising the steps of:
- (a) defining a function of a distribution of values of a designated variable associated with the data records, wherein the function comprises a combination of measures of entropy and adjacency, adjacency being weighted by a weighting factor;
  
  (b) partitioning the values of the designated variable into two or more groups, wherein a value of the function is determined by applying an optimization procedure; and
  
  (c) assigning a data record to a group according to the values of the designated variable.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. A method as recited in claim 1 wherein said partitioning comprises partitioning of data records into groups of approximately equal size.
  - 3. A method as recited in claim 1 further comprising the step of selecting a partition from many computed solutions yielding acceptable performance.
  - 4. A method as recited in claim 1 wherein said optimization procedure results in an optimal assignment.
  - 5. A method as recited in claim 1 wherein said combination is linear.
  - 6. A method as recited in claim 1 wherein the designated variable simultaneously comprises a plurality of values.
  - 7. A method as recited in claim 1 wherein the designated variable corresponds to a designated DNA locus.
  - 8. A method as recited in claim 1 wherein the data records are applicable to agriculture.
  - 9. A method as recited in claim 1 wherein the data records are applicable to forensic science.
  - 10. A method as recited in claim 9 where the forensic science application includes DNA analysis.
  - 11. A method as recited in claim 1 wherein the data records are applicable to space science.
  - 12. A method as recited in claim 1 wherein the data records comprise references to textual information.
  - 13. A method as recited in claim 1 wherein the value of the function is minimized.

14. A computer-implemented method of partitioning data records of a database in a computer, wherein the database is indexed using a tree of nodes, wherein the tree of nodes comprises a root node which is connected to two or more branches originating at the root node, wherein each branch terminates at a node, wherein each node other than the root node is a non-terminal node or a leaf node, wherein each non-terminal node is connected to two or more branches originating at the non-terminal node and terminating at a node, wherein the tree-structured index comprises one or more qiueries associated with each non-terminal node, said method comprising the steps of:
- (a) identifying occurring sets of clusters in the data records of the database;
  
  (b) defining for each identified set of clusters a query that evaluates one of a Boolean expression or a decision tree and assigns each data record within the set of clusters, wherein said qiueries are determined by a combination of measures of entropy and adjacency, adjacency being weighted by a weighting factor; and
  
  (c) associating each query defined in step (b) with a non-terminal node and an associated set of clusters identified in step (a), and associating with each cluster within the set of clusters one branch originating at the non-terminal node, said branch forming part of one or more paths leading to leaf nodes comprising the data records assigned to the cluster by the query.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 15. A method as recited in claim 14 wherein said partitioning comprises partitioning of data records into groups of approximately equal size.
  - 16. A method as recited in claim 14 wherein said combination is linear.
  - 17. A method as recited in claim 14 wherein the data corresponds to DNA.
  - 18. A method as recited in claim 14 wherein the database is applicable to agriculture.
  - 19. A method as recited in claim 14 wherein the database is applicable to forensic science.
  - 20. A method as recited in claim 14 wherein the database is applicable to space science.
  - 21. A method as recited in claim 14 comprising creating a tree-structured index for a database of a computer.
  - 22. A method as recited in claim 14 comprising defining a partition of data records of the database using entropy/adjacency partition assignment.
  - 23. A method as recited in claim 14 both data clustering and entropy-adjacency partitioning being used in the same tree of nodes.

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
University of Tennessee Research Foundation (University of Tennessee)
Original Assignee
University of Tennessee Research Foundation (University of Tennessee)
Inventors
Birdwell, John D., Wang, Tse-Wei, Horn, Roger D., Yadav, Puneet, Icove, David J.
Primary Examiner(s)
Wong, Don
Assistant Examiner(s)
NGUYEN, MERILYN P

Application Number

US10/767,730
Publication Number

US 20040186846A1
Time in Patent Office

1,327 Days
Field of Search

707/4, 707/5, 707/6, 707/101, 707/100, 707/104.1, 706/12, 703/2, 715/850
US Class Current

1/1
CPC Class Codes

G06F 16/2246   Trees, e.g. B+trees

G06F 16/2264   Multidimensional index stru...

G06F 16/285   Clustering or classification

G16B 40/00   ICT specially adapted for b...

G16B 40/30   Unsupervised data analysis

G16B 50/00   ICT programming tools or da...

G16B 50/20   Heterogeneous data integration

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99942   Manipulating data structure...

Y10S 707/99945   Object-oriented database st...

Method of partitioning data records

First Claim

0 Assignments

Litigations

0 Petitions

Accused Products

Abstract

61 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Method of partitioning data records

First Claim

0 Assignments

Subscription Required

Subscription Required

Litigations

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

61 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links