Monday, December 8, 2008

iPhone web enhancement

Notes from my talk at SHDH on improving iphone browser accessibility on regular websites:

Layout

You can use a separate stylesheet to overwrite style settings for your site:

<link rel="stylesheet" href="mobile.css" media="handheld, only screen and (max-device-width: 480px)" type="text/css" />

You almost certainly want to reset the Viewport size. By default mobile safari likes to pretend it has a 940-ish wide screen and just shrinks stuff. You can control this with:

<meta name="viewport" content="width=320" />

Allowing you to be more precise with sizing.

For layout, you're basically looking at a single column. The screen simply isn't wide enough to effectively support two columns or sidebar nav, even in the horiztonal mode. I recommend leaving the column width unspecified, that way it'll resize nicely on orientation switch.

Your mobile stylesheet may need to have some font-size tweaking, you really need to test to check that out.

Specify logo and header images etc in stylesheet, not within the HTML. This lets you replace them with smaller-size versions in the mobile stylesheet.

Typically, a small number of major functions should be buttons across or near the top. Put useless functionality at the very bottom (if at all)

Forms and buttons

Go with native controls where-ever practical, specialised selectors generally mean that your attempt at styling form controls will be partially successful at best.

A persons thumb is roughly 40x40x or worse - nothing like a mouse pointer. Control size matters

Control-misses are much much more common - don't put irritating combinations of controls together - "Save" and "Navigate away from this page and forget everything you did"

Activity-response. If you're scrolled down on the page it's entirely possible to get no feedback at all on a link click. At the very least make sure you get the link changing color with :active.

Images

Preferably, generate mobile-sized thumbnails. 3g is really slow.

Caching

iPhone has a really small cache, both on numbers of items and size. Maximum per item size is 25kb, roughly 18-19 can be used maximum. LRU replacement.

iPhone will *only* cache stuff with Expires or Cache-Control-max-age headers. If you don't set these headers, stuff will get reloaded again and again.

Cached stuff is stored *uncompressed*. sending compressed content won't help with cache size (will help with traffic though)

Minify etc are hugely valuable here if you're javascript-heavy

Javascript etc:

Mobile Safari is pretty much a full webkit implementation. There's not a lot it can't handle. Unless you're doing something really really crazy it'll probably be fine.

Flash does not work at all. Don't even bother.

Video
mpeg4, preferably encoded for iphone size - 480x320, unless it's screen linked. 640x480 is max.
Link directly to the mpeg to get the in-screen view.

Youtube is a bit weird, you may need to test things.

Separate sites

If you're doing a mobile version of your site, rather than just integrating the stylesheet, the convention is m.. Some sites automatically redirect you from the main site if you turn up with a mobile useragent, and give you a cookie-triggered option to go back to the main site (in case you want functionality not implemented on the mobile version or whatever).

Examples

m.flickr.com

Friday, December 5, 2008

Internal messaging systems

In social apps it's not uncommon to have some level of internal messaging system, the general idea is that it allows two users to get in touch without exposing emails or other contact details - keeping initial contact at arms length and within context.

The traditional fast solution to this is to have a table within the database that takes a set of messages,each containing a sender and a receiver, plus subject etc.

This works fine until it comes to deletion (In case readers haven't noticed, deletion is the cause of much trouble within web application databases). The problem is that when the receiver deletes their email, the sender should still see it in their "Sent" box. Similarly, deletion by the sender from the Sent box should not remove the message from the reciever's inbox.

Initially the issue appears simple. We add two flags, sender_deleted and receiver_deleted, and that's that. This works until someone requests group messaging, Cc: or multiple targets.

At this point, the architecture actually tends to work better if we think of each users mailbox as independent. That is, when a user creates a message it gets saved in their "Sent" folder, and a copy is delivered into the receivers "Inbox", from which they can move it (if you support folders) or delete it, or whatever. Obviously this can be done in a single table, it's just that the mailbox owner and the recipient/sender are split up.

There's an overhead here, obviously we're keeping two copies of every message. We can perform some degree of optimisation for this but quite frankly it's almost never worth it. The reason for this is because independent mailboxes offer us a particularly compelling benefit: it scales like crazy.

Users only ever read their own mailboxes, thus it's a perfect candidate for partitioning. The cross-partition delivery mechanism used when you have independent mailboxes means that there is no synchronisation necessary, it's a smooth (and easy) process. Even better, we all have a good feeling for how it should work, because it's how SMTP works in essence.

In the end, I recommend that, unless you're feeling particularly lazy, you go for the independent mailbox architecture right from the start. It's only marginally more effort.

Thursday, December 4, 2008

Cookies and domains

A quick trick. If you're setting cookies from your site, prefix the domain with a ".". Ie: Domain: .phirate.com

The result of this is that any subdomain of phirate.com also receives the cookie, rather than it being limited to phirate.com. This greatly inproves your flexibility under a number of circumstances:

1. www.phirate.com and phirate.com both pick up auth properly
2. You can create subdomains like search.phirate.com without having to do cookie bounces
3. You can utilize CNAMEs to distribute tasks to other web services like Amazon S3 and still have access to your cookies

While a pure session cookie isn't often particularly helpful in these instances, and thus with sessions it's not such a big deal (you need access to the session db to make any use of a session cookie), it's possible to include various signed annotated cookies for other tasks.

A classic example is the delivery of static content that needs a dynamic menu bar containing the users name. In this case it's not a security risk if the name is wrong, it's simply a convenience and aesthetic consistency. Storing the name in a separate cookie, possibly along with other data, allows a trivial bit of static javascript to pick the name up and put it in the right place, allowing you to maximise your use of static, cached content while still giving the appearance of dynamic content.

Sites like Entrecard make heavy use of these techniques to drastically reduce load in many of the public pages. The pages themselves are static with javascript calls to perform layout of the top bar menu. In addition, dynamic content is pulled in via AJAX if the user is authenticated, otherwise static cached information is used. This improves the site resiliency in the face of sudden traffic spikes, since the majority of new users will arrive unauthenticated, while still allowing customisation of the page view for logged-in users.

There is one important caveat: Do not use the . technique for cookies when you're selling sub-domain hosting or similar scenarios where users can upload their own javascript or other cookie-accessing code. In this instance, it may allow the cookie values to "leak", and while that's not a big deal for just a username, you definitely don't want that happening for session ids etc.

Wednesday, December 3, 2008

Making your API easy to use

The web2.0 craze has placed a large emphasis on opening up your data, and some compelling benefits have come as a result of that (not to mention some cool third party apps), but there are a number of decisions you can make that, while simple, dramatically increase the utility of your API.


  1. There are some situations in which SOAP makes sense, but in general if you're talking about a public web application, you want to delivery your API in a simple HTTP/XML or HTTP/JSON fashion. This drastically reduces the workload for people trying to create simple apps. Avoid using odd HTTP methods. Even things like PUT, if you can. Use GET and POST exclusively.

  2. Version your API. The URL to the API should absolutely contain a version number. The main reason for this is that otherwise people who'd like to create apps with a slow delivery pipeline (*cough*apple iphone*cough*) don't dare use your API - if it takes 3 months to get an update through their customers could be down for 3 months if you change without warning.

  3. Design for load. If your data fits the pub/sub pattern, check out services like Gnip which can take all the load off you. If it doesn't, make sure you pay careful attention to caching issues.

  4. Let your API consumers help you. In general, programmers are happy to go a little extra effort in order to reduce the work you have to perform. If you give them the option to retrieve a reduced dataset, or a cached old one, or provide a more complete set of search operators so that the data returned can be limited, all of these will see use by people expecting to hit your service hard.

  5. Make API keys easy to get. Many third party developers get started on a whim, the last thing you want to do is have them put off scratching that itch because getting a key requires your personal approval (This is appropriate sometimes, obviously, but in general not).

  6. Add/Edit functionality is relatively rare in APIs at the moment. This is a shame, it prevents third parties from embedding your apps functionality into desktop clients, iphone apps, firefox toolbars etc. Seriously consider having more than just data retrieval. It might hurt your ad revenues, but it gives you way more staying power.

  7. Simple authentication where necessary. Basic AUTH is fairly standard, and is perfectly secure if you insist on HTTPS.


APIs are, like many aspects of web application design, rather dependant on the problem. That said, it is well worth pushing for as much API as you can build out - every extra method increases the chances that someone will go out and give you a ton of free advertising by building something cool on top of your service.

Finally, designing an API in from the beginning often helps rationalise and make sense of your own software design in a fashion which a straight web application doesn't. The need to abstract controller and model operations helps ensure you factor your code correctly and makes future changes simpler (indeed, in some cases it's possible to base parts of your site off your own API, allowing trivial, secure delegation to subcontractors or junior coders).

Tuesday, December 2, 2008

Un-fucking-believable

The potential evacuation of Kiwis trapped by the protests was thrown into disarray after the air force's two Boeing 757s were declared out of action.

I have no words.

"I am sure the Government is trying to do its best, but I was rather surprised to hear that there are no contingency plans," Mr Goff said.

Shut up labour boy. I voted for you, but there is no way you slime your way out of this by blaming the newbies. Contingency plans should have been there a long time ago. Some time in, say, the last 9 years.

Somewhere, there's a flight engineer banging his head against a fuselage going "I FUCKING TOLD THEM AGAIN AND AGAIN AND AGAIN..". He will go home and rant to his poor, long suffering wife, and not a single one of the morons who let this happen will ever be held to account.

Time and testing

A common problem encountered when creating test suites against stateful libraries (particularly database-backed ones) is the presence of time and time dependant states and actions.

In general, this gets even nastier if you're using NOW() or similar in your SQL statements. So, a few pointers:

Where practical, supply the time to the query specifically. I know it's annoying, but there are a few good reasons for this:

1. Supplying it means that a sequence of actions in a transaction will all get the exact same time, which helps if you're trying to reconstruct things later on.
2. PostgreSQL specific: NOW() is an expensive operation, and *it is not cached*. That is, if you put NOW() in a subselect, it will execute once for every single execution of the subselect. This sucks performance wise and can crush your query if you've got a big dataset.

If you need to use a database call for the current time (or just want to, sometimes it's more elegant, in defaults etc), I recommend using a stored procedure as a wrapper for NOW(). This way, when you're building up your database for testing, you can replace the wrapper stored function with one that returns a fixed time, allowing you to shift time around as you see fit without mucking with the real system time (which is always painful, especially if it's your desktop).

Isolating time like this allows you to test the full gamut of scenarios within your library and ensure that everything will happen as expected in the future - a situation all too many test suites fail to take account of.

Monday, December 1, 2008

I hate TV

Seriously, the absolute last thing they should be doing is putting that kid on TV. Quite frankly, all it's going to do is encourage all the other script kiddies. Despite the constant assertion of clueless TV reporters to the contrary, creating and managing botnets is neither difficult nor a sign of some kind of amazing insight into information technology. The software to do it is readily available and barely more difficult than installing itunes, just about any teenager who spends all their time in front of a computer could do it trivially, let alone people with actual training in the relevant areas.

The only difference between them and this kid is that they don't, because it's *wrong*.

There are people on the internet, many of them suffering from one form of social dysfunction or another, who are unable to empathise with others, and thus are happy to take advantage of them. This is not news, nor is it confined to the internet. Possibly the only news here in fact is that law enforcement managed to catch him. The people who are actually good at this aren't on TV, because catching them behind myriad layers of fakes, crypto and one-way lines of control is extremely difficult to coordinate, with control relays and cutouts in countries across the globe, often in uncooperative jurisdictions and in organisations with no IT staff. Fortunately for all of us he got greedy before he got better at it, and no doubt a money trail provided both better information and more motivation for investigators.

And will you all stop asking when he'll be offered a job? he didn't do anything that would make him more valuable than the risk posed by his clear ethical deficit. It's not like we (the IT profession) don't know how people like him achieve what he does, it'd be like asking someone who performed a standard smash-and-grab when the police were going to hire him for his insight into how it's done. It's a smash-and-grab, the police *know* how it's done, the difficulty is simply that it's not practical to secure everything against it - we rely on the fact that we catch most of them, eventually, and that the remainder of the population has some sense that it is cruel and unfair to do this kind of thing to others.

Personally, I wouldn't hire him over most half-decent coders his age - at least with the others there's a reasonable chance they won't think they're clever trying to install backdoors in your systems when you're not looking. Experience suggests it takes these fools another ten years minimum before they grow up and start to understand the kind of impact their idiocy has.

SHDH coming up

New SHDH coming up on the 7th. Yours truly will be doing a workshop on improving your website for mobile browsers. The idea, basically, is that the majority of websites (that have a reasonably CSS-heavy design element at least) tend to display fairly poorly on browsers like mobile safari - they're there, you can read them, but they're not simple. The addition of a mobile stylesheet is often all that's necessary to dramatically improve the usability, especially where touch-screen devices are concerned.

The workshop will involve a short rundown on strategies to achieve improvements, and then a collaborative attempt to improve some sites. There will be no mocking of existing designs so you should bring a laptop with a checkout of your site on it so you can play.

The motivation for this is mostly selfish - I have a mobile browser and I'd love it if more websites sorted their CSS out to make my life easier.

Other secrets

Occasionally within web applications we have to generate secrets that aren't passwords. While I've covered passwords in general before, these secrets often have a different context.

A classic example is a coupon code. A coupon code is distinctly different from a user password:

1. It is normally one-shot, and often linked in an email, so remembering it isn't such a big deal
2. It rarely has a "key", in the sense that if your keyspace for your coupon code is 100,000, and you have 10,000 coupon codes active, an attacker only needs to guess 10 times on average to hit the jackpot - they don't need the "user name" to go with it.

In this case, you always want a bigger keyspace than a regular password. In addition, you want something that works well when printed and, of course, doesn't contain any naughty words.

One of the simplest ways to make this happen is the following:

1. 12 characters, all upper case
2. Remove confusing characters, I, L, J, 1, 0, O, U, V, 5, S from the list.

This gives you nice readable characters with a space 95,428,956,661,682,176 or so in size. Then, to get rid of all the naughty words, a trivial trick:

3. Remove all the remaining vowels

You can't make dodgy words without vowels. Not ones people can reasonably take offense to anyway. It's simple, and avoids having big long useless blacklists.

And out of it, you get:

HF8DDHNRRPKQ

If you're sending this in an email and it's likely to be a phone-in, remember to give a phonetic representation as well (Hotel-Foxtrot-8-Delta-Delta etc etc). This saves your users coming up with embarrassing phonetics of their own.