Parallel data processing architecture
DCFirst Claim
1. A parallel data processing system for search, storage and retrieval of data of a database responsive to client queries for specific data of said database, said parallel data processing system comprising:
- a plurality of host processors including a root host processor, said root host processor being responsive to said client queries for said specific data of said database, wherein at least two host processors have a search engine and maintain information of a search queue of said client queries;
at least two host processors having a queue of search requests for specific data of said database, each of said host processors executing a search engine, communicating capacity and load information between host processors and said at least two host processors exchanging at least one search request, the search engine removing at least one search request from a search queue and generating an additional search request,each of said host and root host processors maintaining a list of available host processors and information about the capacity and load for each available host processor in memory and broadcasting its capacity and load information to other host processors and bringing its search queue into balance with another host processor according to a time constant in response to receipt of said broadcast capacity and load information; and
a communications system coupling said host and root processors, wherein at least two host processors communicate capacity and load information to other host processors;
selected host processors storing a database index for said database comprising nodes of a database tree for said database and data accessible via said nodes of said database tree.
0 Assignments
Litigations
0 Petitions
Accused Products
Abstract
A tree-structured index to multidimensional data is created using naturally occurring patterns and clusters within the data which permit efficient search and retrieval strategies in a database of DNA profiles. A search engine utilizes hierarchical decomposition of the database by identifying clusters of similar DNA profiles and maps to parallel computer architecture, allowing scale up past previously feasible limits. Key benefits of the new method are logarithmic scale up and parallelization. These benefits are achieved by identification and utilization of naturally occurring patterns and clusters within stored data. The patterns and clusters enable the stored data to be partitioned into subsets of roughly equal size. The method can be applied recursively, resulting in a database tree that is balanced, meaning that all paths or branches through the tree have roughly the same length. The method achieves high performance by exploiting the natural structure of the data in a manner that maintains balanced trees. Implementation of the method maps naturally to parallel computer architectures, allowing scale up to very large databases.
46 Citations
20 Claims
-
1. A parallel data processing system for search, storage and retrieval of data of a database responsive to client queries for specific data of said database, said parallel data processing system comprising:
-
a plurality of host processors including a root host processor, said root host processor being responsive to said client queries for said specific data of said database, wherein at least two host processors have a search engine and maintain information of a search queue of said client queries; at least two host processors having a queue of search requests for specific data of said database, each of said host processors executing a search engine, communicating capacity and load information between host processors and said at least two host processors exchanging at least one search request, the search engine removing at least one search request from a search queue and generating an additional search request, each of said host and root host processors maintaining a list of available host processors and information about the capacity and load for each available host processor in memory and broadcasting its capacity and load information to other host processors and bringing its search queue into balance with another host processor according to a time constant in response to receipt of said broadcast capacity and load information; and a communications system coupling said host and root processors, wherein at least two host processors communicate capacity and load information to other host processors;
selected host processors storing a database index for said database comprising nodes of a database tree for said database and data accessible via said nodes of said database tree. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
2. A parallel data processing system for search, storage and retrieval of data of a database responsive to client queries for specific data of said database, said parallel data processing system comprising:
-
a plurality of host processors including a root host processor, said root host processor being responsive to said client queries for said specific data of said database; each of said host and root host processors maintaining a list of available host processors and information about the capacity and load for each available host processor in memory; at least two host processors having a queue of search requests for specific data of said database, each of said host processors executing a search engine, communicating capacity and load information between host processors and said at least two host processors exchanging at least one search request, the search engine removing at least one search request from a search queue and generating an additional search request, and a communications system coupling said host and root processors, wherein at least two host processors communicate capacity and load information to other host processors and each have a search engine and each maintain load information of a search queue length of said client queries;
each of said at least two host processors broadcasting its capacity and search queue length load information to other host processors and bringing its search queue of said client queries into balance according to a time constant with another host processor in response to receipt of said broadcast capacity and load information;
selected host processors storing a database index for said database comprising nodes of a database tree for said database and data accessible via said nodes of said database tree wherein the plurality of host processors comprises three host processors, of which two host processors have search engines and maintain information of said search queue of said client queries and the third comprises said root host processor. - View Dependent Claims (18)
-
-
3. A parallel data processing system for search, storage and retrieval of data of a database responsive to client queries for specific data of said database, said parallel data processing system comprising:
-
a plurality of host processors including a root host processor, said root host processor being responsive to said client queries for said specific data of said database; each of said host and root host processors maintaining a list of available host processors and information about the capacity and load for each available host processor in memory; at least two host processors having a queue of search requests for specific data of said database, each of said host processors executing a search engine, communicating capacity and load information between host processors and said at least two host processors exchanging at least one search request, the search engine removing at least one search request from a search queue and generating an additional search request, and a communications system coupling said host and root processors, wherein at least two host processors communicate capacity and load information to other host processors and have a search engine and maintain load information of a search queue length of said client queries;
each of said at least two host processors bringing its search queue of client queries into balance with another host processor according to a time constant in response to receipt of said broadcast capacity and load information;
selected host processors storing a database index for said database comprising nodes of a database tree for said database and data accessible via said nodes of said database tree wherein the plurality of host processors comprises two host processors, of which one comprises said root host processor and both said host processors have search engines and maintain information of said search queue of said client queries. - View Dependent Claims (19, 20)
-
Specification