Friday, April 17, 2009

Decentralized Reputation II

(for latecomers - previous post)

I wasn't looking forward to coming up with rules for dealing with transitive trust, but as it turns out, I'm in luck - I stumbled upon subjective logic a few days ago. It basically adds a few new logical operators for dealing with opinions in a surprisingly clean way. This paper has a better explanation than the wikipedia article, if you're curious. There are two key points that I'm going to make use of throughout this post: it's possible to reason about the amount of trust you should assign to opinions even through an arbitrarily deep trust chain, and in order to perform this reasoning accurately it's critical that you have access to the entire chain, not just the transitive opinions of people you trust.

Before I start, there are two things that you absolutely have to know about before this post will make any sense: cryptographic hashes, and public-key cryptography. (CS majors and other such people can probably skip this paragraph. :) A hash function takes some arbitrary data, and generates from it a fixed-size block of data. The hash needs to have some properties: given some data, you can generate a hash, but given a hash, you can't find some chunk of data which hashes to the same thing, or tell anything about the data that went into the hash. With public-key cryptography, you have two keys, which correspond to two functions that are inverses of each other. Furthermore, you can assume that key pairs are unique. With public-key, you can do things like give somebody one of the functions, and encode data with the other one, which simultaneously protects the data (because it's encrypted), and proves that you are who you say you are (because you would have to have the corresponding private key, if the data decrypts correctly with the public key you gave them).

Identity

This identifying property of public-key crypto is tremendously useful. It means that, for as long as you can keep the private key secret, you can prove that all messages signed with your private key are actually from you. Thus, a public/private keypair is at the core of the idea of an identity. Identity using public key crypto is largely a solved problem, so I won't spend all that much time on it.

One thing that's unfortunately missing from current implementations is a good method of key revocation. Rather than fiddling around with revocation keys that have to be broadcast somehow, I'm going to introduce the concept of an "identity server". An identity server in this scheme is a server, which has your private key (or better yet, a proxy key) and can answer requests about it. Individuals will be able to run their own ID servers, of course, but somehow I think that not everybody will want to.

Reputation

So we have an identity system, with identity servers - great! Except, computers can generate keypairs pretty darn quickly these days, so identities are a dime a thousand. We need some way to distinguish between legitimate people and hordes of spambots. We'd also like a way to know if a given person's trustworthy, which turns out to be a very similar problem. Getting a meaningful measure of trust from raw data is kind of tricky, which is why I'm going to invoke Jøsang's work on subjective logic here. We can compute a meaningful measure of trust for an identity given a network of trust and opinion ratings from various users, such that there's at least one unbroken chain between us and the identity we're trying to evaluate.

The big question (how will this all work?) then becomes three much easier questions: where do these ratings come from?, where do we go to get these ratings?, and how will we find unbroken chains?

Where the ratings come from is easy - people are already able to judge the trustworthiness of the people they interact with regularly, which should be enough data for the system to function. As we all know, everybody is at most six degrees away from Kevin Bacon, so the maximum number of hops we'll have to go to find somebody is twelve.

Where to get the ratings is a bit trickier, but still doable. The problem is that people tend to go offline at random times, meaning that we'll need to store the ratings in the Internet somewhere. A distributed service like DNS would be good, but we'll still need a stable source for ratings, that the distributed system could then be a cache of. (Aside: in fact, because of the nature of digital signatures, the distributed cache system wouldn't even need to be trusted. If you were walking down the street one day, and a shady-looking guy stepped out of an alley and handed you a digitally signed document on a floppy disk, you could be 100% certain that the document was authentic, if you had access to the signer's public key. Neat, eh?) An obvious candidate is the identity servers, since those already need to be up all the time. In my scheme, a person's identity server would be responsible for storing and giving out all the opinions that people have had about that person.

Alright, so here's one of the nifty bits - this came to me at about 3 AM while I was trying to fall asleep, as good ideas sometimes do. It seems that if the identity server is responsible for maintaining all opinions about you, good or bad, then it has no real reason to keep the bad ones. It could easily tell whoever asked that nobody has ever said anything bad about you, and nobody would be the wiser. What we need is a way for somebody leaving an opinion to prove that it was accepted. The obvious way to do this is to have the identity server sign incoming opinions and give back the signature, but there's still vulnerability here - the identity server could pretend to be suddenly deaf when you're saying mean things. On the other hand, we can imagine a three-step exchange, where you hand the server a hash of the opinion you want to give it, the server signs that and gives it back, and then you give the server the opinion. Remember, cryptographic hashes are opaque, so this means that the server is forced to accept all opinions. The user creating the opinion can simultaneously submit it to the distributed cache - this would ensure that an identity server which drops negative opinions can be discovered.

As for the third question - how are trust chains discovered? - this one is a little difficult. Finding optimal paths through a graph is easy if you have the entire graph lying around; not so much if it's spread out over thousands of computers, some of which are down at any given time. Ideally, we'd like to be able to find the best trust path between ourselves and some random other identity. However, we don't really need the best path; if a lot of paths exist, then some pretty good path is usually good enough. In that case, our slightly unreliable distributed cache suddenly looks pretty good - it has all the necessary data already, and with our relaxed requirements can give us exactly what we need.

So, there you have it. As far as I can think through, this proposal is mostly solid; I've spotted a few weak points already, and I might try to paper them over in a future post, but the core structure should stay the same. Even though this was originally a thought experiment, to prove that something like this is possible, I have a strange urge to actually implement it now ._.

2 comments:

Kiriska said...

Honors thesis in the making? :3

I'm proud that I actually remembered/already knew wtf hash functions and public-keys were and that I actually understood all of that. Woo! Still, I guess every time I start thinking that I should have never transferred in the first place, I just have to read one of your CS-centric posts. <_<

Joe Hopkins said...

Looks like a case for distributed caching with some intelligence.