a glob of nerdishness

November 19, 2009

Twitter context bookmarklet

written by natevw @ 2:03 pm

Sometimes I’ll end up on an individual Twitter post that is obviously a continuation of a previous train of thought and wish I could see the context without having to re-find the tweet way back in the user’s stream. According to the interwebs, Twitter has been hiding a solution to this for a while, but I just noticed it today.

I’d be surprised if someone hasn’t already done this, but I’ve whipped up a bookmarklet that makes it easy to jump to a tweet in its context:

Tweet context

Just drag that link to your toolbar, and click to go from a standalone tweet to the stream of posts leading up to it.

November 18, 2009

Go Unicode

written by natevw @ 3:27 pm

I’ve been eagerly learning about Go lately. It’s a nascent systems programming language with a nice design and some great features.

I have also gotten back into reading through my Unicode 5.0 book during the past weeks, so when I saw that Go had a built in string type I was immediately curious as to what that meant. My initial conclusion was not good.

Then I realized one of the Go team’s earlier inventions was UTF-8, a Unicode encoding with many pragmatic properties. After a little more research, here’s the skinny.

Go’s strings:

This last point could be seen as a drawback, because it means that <LATIN SMALL LETTER E WITH ACUTE> will not compare equal to the equivalent <LATIN SMALL LETTER E> + <COMBINING ACUTE ACCENT>. However, to do this natively in Go would require each standalone binary to include a large set of character code table information. Furthermore, there are two equivalence forms defined by Unicode. I have an opinion on which one the Go compiler itself should eventually use for token comparison, but for runtime use neither could serve as the one meaning of the string comparison operator.

Like so much of Go’s design, the way strings work is an elegant compromise that encourages useful idioms without making decisions a programming language shouldn’t. Normalization goes beyond settling the encoding question, and begins a climb up the tall stack of human language concerns. (Check out the ICU project for a sampling of basic Unicode toppings.)

One final note about the implications of Go’s string type: In C, it can be tempting to use string functions on binary data known to contain no inner ‘\0′ bytes. Go’s type system should make this obvious, but use uint8 slices — and never strings — for binary data in Go. Even if your bytes have no terminating ‘\0′ characters, trying to iterate over binary data as string characters will not yield what you expect due to the way UTF-8 encoding works.