Similarity-based access control of data in a data processing system
DC CAFCFirst Claim
Patent Images
1. A computer-implemented method, the method comprising:
- (A) for a first data item comprising a first plurality of parts,(a1) applying a first function to each part of said first plurality of parts to obtain a corresponding part value for each part of said first plurality of parts, wherein each part of said first plurality of parts comprises a corresponding sequence of bits, and wherein the part value for each particular part of said first plurality of parts is based, at least in part, on the corresponding bits in the particular part, and wherein two identical parts will have the same part value as determined using said first function, wherein said first function comprises a first hash function; and
(a2) obtaining a first value for the first data item, said first value obtained by applying a second function to the part values of said first plurality of parts of said first data item, said second function comprising a second hash function;
(B) for a second data item comprising a second plurality of parts,(b1) applying said first function to each part of said second plurality of parts to obtain a corresponding part value for each part of said second plurality of parts, wherein each part of said second plurality of parts consists of a corresponding sequence of bits, and wherein the part value for each particular part of said second plurality of parts is based, at least in part, on the corresponding bits in the particular part of the second plurality of parts; and
(b2) obtaining a second value for the second data item by applying said second function to the part values of said second plurality of parts of said second data item; and
(C) ascertaining whether or not said first data item corresponds to said second data item based, at least in part, on said first value and said second value.
3 Assignments
Litigations
2 Petitions
Accused Products
Abstract
Similarity of data items is determined by analyzing corresponding segments of the data items. A function is applied to each segment of a data item and the output of that function is compared to the output of the same function applied to a corresponding segment of another data item. A function may be applied to the output of the functions. The functions may be hash or message digest functions.
209 Citations
56 Claims
-
1. A computer-implemented method, the method comprising:
-
(A) for a first data item comprising a first plurality of parts, (a1) applying a first function to each part of said first plurality of parts to obtain a corresponding part value for each part of said first plurality of parts, wherein each part of said first plurality of parts comprises a corresponding sequence of bits, and wherein the part value for each particular part of said first plurality of parts is based, at least in part, on the corresponding bits in the particular part, and wherein two identical parts will have the same part value as determined using said first function, wherein said first function comprises a first hash function; and (a2) obtaining a first value for the first data item, said first value obtained by applying a second function to the part values of said first plurality of parts of said first data item, said second function comprising a second hash function; (B) for a second data item comprising a second plurality of parts, (b1) applying said first function to each part of said second plurality of parts to obtain a corresponding part value for each part of said second plurality of parts, wherein each part of said second plurality of parts consists of a corresponding sequence of bits, and wherein the part value for each particular part of said second plurality of parts is based, at least in part, on the corresponding bits in the particular part of the second plurality of parts; and (b2) obtaining a second value for the second data item by applying said second function to the part values of said second plurality of parts of said second data item; and (C) ascertaining whether or not said first data item corresponds to said second data item based, at least in part, on said first value and said second value. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method comprising:
-
(A) maintaining a database of values, at least one value for each data item of a plurality of data items, wherein each data item of the plurality of data items comprises a corresponding one or more parts, and wherein each of the one or more parts of each data item comprises a corresponding sequence of bits, and wherein each of the one or more parts of each data item has a corresponding part value, the part value for each particular part being based on a first given function of the corresponding sequence bits for that particular part, wherein two identical parts will have the same part value as determined using the first given function, and the value for each particular data item being based, at least in part, on a second given function of the part values of the one or more parts of that particular data item, and wherein the first given function comprises a first hash function, and the second given function comprises a second hash function; (B) obtaining a second value, the second value corresponding to a second data item, the second data item comprising a corresponding one or more parts, each of the one or more parts of the second data item comprising a corresponding sequence of bits, each of the one or more parts of the second data item having a corresponding part value, wherein the part value for each particular part of the second data item is based on the first given function of the corresponding sequence of bits in that particular part of the second data item; and wherein the second value is based on the second function of the one or more part values of the second data item; and (C) ascertaining whether or not the second data item corresponds to any of the plurality of data items, based, at least in part, on whether or not the second value corresponds to any value in the database of values. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A computer-implemented method comprising:
-
(A) obtaining a particular data item value corresponding to a particular data item, the particular data item comprising a corresponding one or more parts, each of the one or more parts of the particular data item comprising a corresponding sequence of bits, each of the one or more parts of the particular data item having a corresponding part value, wherein the part value for each specific part of the one or more parts of the particular data item is based, at least in part, on a first given function of the corresponding sequence of bits in that specific part of the particular data item; wherein two identical parts will have the same part value as determined using the first given function, and wherein the particular data item value is based, at least in part, on a second given function of the one or more part values of the particular data item, wherein the first given function comprises a first hash function, and the second given function comprises a second hash function; and (B) ascertaining whether or not the particular data item corresponds to any of a plurality of data items, based, at least in part, on whether or not the particular data item value corresponds to any value in a database of data item values, wherein the database of data item values comprises at least one data item value for each data item of the plurality of data items, wherein each data item of the plurality of data items comprises a corresponding one or more parts, and wherein each of the one or more parts of each data item of the plurality of data items comprises a corresponding sequence of bits, and wherein each of the one or more parts of each data item of the plurality of data items has a corresponding part value, the part value for each particular part of the one or more parts of each data item being based on the first given function of the corresponding sequence bits for that particular part, the data item value for each particular data item being based, at least in part, on the second given function of the part values of the one or more parts of that particular data item. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A computer-implemented method comprising:
-
(A) maintaining a database comprising a mapping of data item keys to corresponding data item information for each of a plurality of data items, wherein each data item of the plurality of data items has at least one data item key, wherein each data item of the plurality of data items comprises a corresponding one or more portions, and wherein each of the one or more portions of each data item comprises a corresponding sequence of bits, and wherein each of the one or more portions of each data item has a corresponding portion value, the portion value for each particular portion being based on a first given function of the corresponding sequence bits for that particular portion, wherein the first given function comprises a first hash function, and wherein two identical portions will have the same portion value as determined using the first given function, the particular data item key for each particular data item being based on a second given function of the portion values of the one or more portions of that particular data item, wherein the second given function comprises a second hash function; (B) obtaining a particular value, the particular value having been determined from a corresponding one or more particular portions, each of the one or more particular portions comprising a corresponding sequence of bits, each of the one or more particular portions having a corresponding portion value, wherein the portion value for each specific portion of the one or more particular portions is based on the first given function of the corresponding bits in that specific portion; and wherein the particular value is based on the second function of the portion values of the one or more particular portions; and (C) using the particular value and the database to ascertain whether or not the one or more particular portions correspond to any of the plurality of data items. - View Dependent Claims (33, 34, 35)
-
-
36. A computer-implemented method comprising:
-
(A) for each particular data item of a plurality of data items; (a1) determining a corresponding particular data item key; and (a2) adding an entry to a database to map said particular data item key to information about the particular data item, wherein each data item of the plurality of data items comprises a corresponding one or more parts, and wherein each of the one or more parts of each data item comprises a corresponding sequence of bits, and wherein each of the one or more parts of each data item has a corresponding part value, the part value for each particular part being based on a first given function of the corresponding sequence bits for that particular part, wherein the first given function comprises a first hash function, wherein two identical parts will have the same part value as determined using the first given function, the data item key for each particular data item being based on a second given function of the part values of the one or more parts of that data item, wherein the second given function comprises a second hash function; (B) determining a second key value, the second key value being based on one or more particular parts, each of the one or more particular parts comprising a corresponding sequence of bits, each of the one or more particular parts having a corresponding part value, wherein the part value for each specific part of the one or more particular parts is based on the first given function of the corresponding bits in that specific part; and wherein the second key value is based on the second function of the part values of the one or more particular parts; and (C) using the second key value and the database to ascertain whether or not the one or more particular parts correspond to any of the plurality of data items. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45)
-
-
46. A computer-implemented method comprising:
-
(A) for each particular file of a plurality of files; (a2) determining a particular digital key for the particular file, wherein the particular file comprises a first one or more parts, each part of said first one or more parts having a corresponding part value, the part value of each specific part of said first one or more parts being based on a first function of the contents of the specific part, wherein two identical parts will have the same part value as determined by the first function, and wherein the particular digital key for the particular file is determined using a second function of the one or more of part values of said first one or more parts; and (a2) adding the particular digital key of the particular file to a database, the database including a mapping from digital keys of files to information about the corresponding files; (B) determining a search key based on search criteria, wherein the search criteria comprise a second one or more parts, each of said second one or more parts of said search criteria having a corresponding part value, the part value of each specific part of said second one or more parts being based on the first function of the contents of the specific part, and wherein the search key is determined using the second function of the one or more of part values of said second one or more parts; (C) attempting to match the search key with a digital key in the database; and (D) if the search key matches a particular digital key in the database, providing information about the file corresponding to the particular digital key. - View Dependent Claims (47, 48, 49, 50, 51)
-
-
52. A computer-implemented method comprising:
-
(A) for each particular file of a plurality of files; (a1) determining a corresponding particular file key; and (a2) adding an entry to a database to map said particular file key to information about the particular file, the information about the particular file including one or more locations of the particular file, wherein each file of the plurality of files comprises a corresponding one or more parts, and wherein each of the one or more parts of each file has a corresponding part value, the part value for each particular part being based on a first hash function of that particular part, wherein two identical parts will have the same part value as determined using the first hash function, the file key for each particular file being based on a second hash function of the part values of the one or more parts of that file; (B) determining a second key value, the second key value being based on one or more particular parts, each of the one or more particular parts having a corresponding part value, wherein the part value for each specific part of the one or more particular parts is based on the first hash function of that specific part; and wherein the second key value is based on the second hash function of the part values of the one or more particular parts; and (C) comparing the second key value to key values in the database to ascertain whether or not the one or more particular parts correspond to any of the plurality of files. - View Dependent Claims (53, 54, 55, 56)
-
Specification