Bloom filters seem to be popping up in the foaf world quite a bit recently. Henry Story has been posting lots of good thoughts on this (see “My Bloomin’ Friends” from this past August as well as more recent postings on foaf-dev). Henry used an example Bloom Filter vocabulary in his discussion that looked something like this:
a :Bloom ;
:length 1000 ;
I asked him if he had developed an actual vocabulary for this, or if he knew of any similar work, but didn’t find any published work.
I’d like to flesh this idea out a bit and get a usable vocabulary out of it. I think what Henry came up with looks good, but think it probably needs another predicate to identify the algorithm used to generate the filter. This would allow the vocabulary to be used across languages and environments which might rely on different hashing algorithms and allow systems to determine if they can interact through an arbitrary filter.
Here’s what a simple filter might look like as emitted by the LOAF tools:
a :Bloom ;
:algorithm <http://loaf.cantbedone.org/#filter> ;
:length 1000 ;
The idea here is that the value of the
:algorithm predicate would indicate what algorithm was used to compute the filter (the value stashed in
:algorithm might imply its own set of parameters or allow for additional algorithm-specific predicates; based on the LOAF example, such additional information would include the underlying hashing algorithm for each hash (such as “sha1;salt=a”, “sha1;salt=b”, …). Of course, the
:algorithm value should really be defined by the person who’s actually defining the algorithm (so my defining one for the LOAF algorithm in their namespace is bad form, but hopefully the idea is clear).
Bringing this back to Henry’s example, a FOAF file might then do something like this (modulo the appropriate OpenID login stuff):
<public> rdfs:seeAlso <protected> .
<protected> :readableBy [
a foaf:Group ;
a bloom:Bloom ;
bloom:algorithm <http://blogs.sun.com/bblfish/entry/my_bloomin_friends#applet_filter> ;
bloom:length 400 ;
Once they login, this would allow anyone in the (hidden) group to retrieve more data from <protected> than was available at <public>. I’m not sure how it would play with TAG guidelines, but this could also be used to provide varying levels of data at <public> depending on whether the client has logged in and what credentials they might have.
Of course, I have my own reasons for playing around with Bloom filters and RDF having to do with SPARQL, but I’ll save that for another post.