Monday, December 27, 2010

Creating a Private Blog on a Free Blogging Service

There are a few problems with Facebook (and similar services):

  1. It's a walled garden
  2. You have no control over your data (even though you think you do)
  3. You're not the customer, you're the product

Free blogging sites get around problem 1, but not the other two. Generally, you have even less control over your data, since the whole point is to publish so that everyone in the world can potentially read it. To get around all of these problems, you'd need to host your own site, which can be a pain and costs more than it's worth to most people.

So, what can we do with a free blogging site? We can post encrypted articles, and only distribute the keys to the people we want reading them. Not only have we removed problem 1, since anyone with the appropriate crypto can read the articles, but we've also partially removed problem 2. Why partially? I'll get to that in a bit. We still have problem 3, but that's an economic reality for any free service. You can, however, shop around for a service that treats you as a product with dignity, at least, and you can potentially find a paid blogging service that doesn't support encryption (or whose encryption you don't want to use), at which point you become the customer, and just a little more human in the eyes of the service.

Why might you not want to use a perfectly functional encryption service provided by a blog host? It's a question of who has the keys. It's almost certain that the host would have your encryption keys, and would provide the encryption and decryption on the fly. While convenient, it's still a loss of control, and they can hand your decrypted data to anyone they choose (though you may have some contract protections in this regard). It's also likely that they'll use password-based authentication. We're going to use public-key authentication, and we're going to do it in a way that's fairly easy and robust againt forgotten passwords.

Let's consider the following scheme. You write a new article for your semi-private blog. The bulk of this article (or maybe just a small part of it) is a well-delimited block of ciphertext. Maybe it looks like the following:

Key: http://some.location/key_identifier
We use special tags to denote the beginning and end of the special contents. This is easy for a person to pick out visually, and is also easy for a program to parse. The first line points to a URL with keying information for this article. We'd expect many articles to use the same key, since there's no reason not to. Keys should be changed occasionally, to prevent certain attacks that come from large amounts of available ciphertext, and when you want to deny someone who previously had access to your articles access to any new ones. We'll use a nice strong symmetric key encryption algorithm, such as AES-256.

We now have our encrypted article, how do we distribute the keys? The simplest way to do this is through another blog post. We have one key, but we want to make it available to a potentially large number of people. Let's say each of them has an RSA public key. A simple way to propagate the key is with a list of the following form:

Alice   E(Alice,key)
Bob     E(Bob,key)
Charlie E(Charlie,key)
Here the first column is the person's name, and the second is the key encrypted with that person's public key. This isn't great, from a privacy standpoint, because you've just transmitted the names of all your friends. Slightly better is
Pubkey(Alice)   E(Alice,key)
Pubkey(Bob)     E(Bob,key)
Pubkey(Charlie) E(Charlie,key)
Now we haven't revealed anyone's name, but we've revealed their public keys. This allows someone to correlate public keys between subsequent AES keys, revealing the degree of churn in your list of friends. Also, by publishing pairs of public keys and ciphertexts, you're potentially giving an adversary a leg up in cracking the corresponding private keys. Since just a smidge more paranoia costs us very little, let's instead go with the following:
H(Pubkey(Alice)|E(Alice,key))     E(Alice,key)
H(Pubkey(Bob)|E(Bob,key))         E(Bob,key)
H(Pubkey(Charlie)|E(Charlie,key)) E(Charlie,key)
The first column is now a hash of the person's public key and the ciphertext in the second column. Note that previously, your friend could immediately recognize the appropriate line of keying material to decrypt in order to retrieve the AES key. Now he or she has to perform a simple hash based on each line until one of them matches. The hash function doesn't have to be particularly great for this, so we can use something simple like MD5 without worrying about security or privacy being appreciably compromised.

What are our security and privacy properties now? Well, your semi-private articles should be well protected by encryption, and your friends should be able to recover the symmetric key. The identities of your friends are protected, for the most part. What data does this system leak, though?

  1. The hosting service knows who's retrieving your posts, though not who's successfully decrypting them.
  2. The world in general knows how often you are posting.
  3. The world in general knows how long your posts are.
  4. The world in general knows how many people are able to read your posts.
We could do better if we were self-hosted, but this is about the limits of using a free service like Blogger. If you think you have a way to reduce the amount of data leaked, please let me know.

That's the scheme, but how to implement it is another matter. We'd like to have some way for someone to navigate to an article, and be presented with a decrypted page. The easiest way to do this is probably to create a Firefox extension. Note that this must be written in javascript and CSS. The state of cryptography in javascript isn't great, from what I've found poking around online. If a person's public and private keys are loaded into the browser, then the extension should be able to use them to decrypt first the symmetric key and then the article. The extension should probably cache the ciphertext (not the plaintext!) of the symmetric key, since it'll likely be used multiple times. The URL identifies the keys sufficiently at that point.

For most people, the public key is likely to be the most intimidating part. Someone running Linux can easily create an RSA key using OpenSSL. There's no need for a signed certificate. I don't know what would need to be done on Windows. If the blogger is reasonably crypto-savvy, then a BER- or DER-formatted RSA public key, an X.509 certificate, or a PGP/GPG certificate should be equally effective mechanisms for relaying public keys. Generating the list of ciphertexts for a new symmetric key will probably be done on the command line. We'll worry about friendlier interfaces later.

A really nice feature of a scheme like this is that if one of your friends forgets his private key password, he can just send you a new public key and you can either email him the key ciphertexts or edit the old postings to add the new public key's cipher.


Steve said...

Quick thought on keys: you could use a time dependent evolving key together with a strong second key to avoid large ciphertext attacks. Since the posts are time-tagged, create a _very_ short post, then put the data in the comments with the post's timestamp as quasi-salt for the comment encryption. Then the maximum ciphertext is the length of the comment thread on a single post. This would be similar to those RSA dog-tags that so many companies use to secure external access to their networks.

Mike Marsh said...

I really think that's overkill. Monthly/weekly/daily rekeying should be sufficient. Per-post rekeying would also work, and the keying info could be done as a set of metadata tags at the beginning of the CRYPTOBLOG block.

Steve said...

Yeah. That just came to mind because I have a deep affinity for strong hashes of integers -- more generally, hashes of sequences -- as random number generators. As you know, I'm a big fan of variable dimensionality Monte Carlo integration...

Mike Marsh said...

I prefer to use a more thoroughly tested cryptosystem. Trying to create a one-time pad using a keyed hash (which I'm assuming is more or less what you're suggesting) could end up having some serious flaws.

Steve said...

Fair 'nuf. I'm not an expert and not really up on the current status of what is deemed to be secure and what is not. I was merely drawing an analogy.

One other comment, though: Python (or any equivalent language) gives you the capability to interact with web services. It might be worth considering a local client in Python that scrapes the blog and presents a local service for a browser to view. That gets you around the lack of cryptographic tools in JS and gives you more flexibility in handling the content.

I would also think that Python or any other language could be accessed in a browser plugin, so in principle the client does not need to be a daemon or command line utility.

I'm interested in this because I have an application waiting in the wings that would be conveniently wrapped in a browser, but would require some clever code in something like Python to run the back end. I've thought about using a server but this notion of a plugin that accesses and manipulates the desired information is intriguing.

Mike Marsh said...

You know, you raise a good point. I could create a Java applet and have it be browser-neutral. That'd also get around needing a local service, and I've gotten fairly comfortable with Java. Hosting for the applet might be an issue.