Geocoding while blogger (2)

Welcome! As promised in my previous post, today I am releasing a Greasemonkey script that allows geocoding, using the geo microformat, in Blogger‘s interface. Remember my previous fit of creativity concerning names? Well, I’ve decided to name this one InsertGeo (don’t laugh at me, just click on this link if you want to install it without reading any further; of course, this assumes Greasemonkey is already installed and in good working order on your Firefox). The screen capture next to this paragraph gives an idea about what to expect after installation.

This script will add a button to the Edit Html editor tab on Blogger’s new post and edit post pages. You may select some text in the editing area before pressing the button; the text box will appear filled with the selected text, which may be edited. Or you might prefer not to select anything beforehand: just enter in the text box the address you want to have geocoded. Press OK, and after a (hopefully) short wait you will see the geo microformat code entered in your post.

This is the kind of code the script generates for a certain address (incidentally, where I spent my much remembered last vacations):

  • Address to geocode:

Avenida de la Mojarra, Ayamonte, Huelva, Spain

  • Microformat generated:
<span class="geo">Avenida de la Mojarra, Ayamonte, Huelva, Spain (
  <abbr class="latitude" title="37.186030">37º11'10" N</abbr> 
  <abbr class="longitude" title="-7.338429">7º20'18" W</abbr>)
</span>

Avenida de la Mojarra, Ayamonte, Huelva, Spain (37º11’10” N 7º20’18” W)

The actual geocoding is performed by Yahoo Pipes. There are other services available on the Net, but the flexibility and extensibility of Pipes is as hard a match as you could find. The pipe I am using (also by yours truly) is called geoloc_pipe; if you take a peek at it, you will notice that there is more data built into the feed than it’s needed to compose a geo microformat. In fact, just enough to generate an adr microformat. An idea for further development?

Anuncios

Geocoding while blogger

Suppose you go somewhere cool on vacation —it has to be cool enough so as to trigger show-off reflexes. Blogging about it, you’d think how nice would be that your mother readers could click on something to have the location displayed on a mapping service (Google Maps, Yahoo Maps and the like). Let’s envision some scenarios:

  1. You look up your vacation spot on some of those mapping services, and link the result page from your blog. This has the drawback of denying your readers of their choice concerning mapping providers; moreover, it’s not always trivial to discover the URL to link to (hint: with Yahoo Maps, you have to copy&paste the URL from your browser’s direction box; Google Maps offers a Link to this page tool).
  2. Same as before, but getting (somehow) longitude and latitude and paste those in your blog. This is not trivial with both main mapping providers: coordinates are there, but they are hidden in plain sight. You have to rip them off from the result page URL:

    http://maps.yahoo.com/#mvt=m&trf=0&amp;lon=-3.682276
    &lat=40.471706&mag=2

    There you are: coordinates in decimal degrees. You have more work to do if you want to show your location in a more readable format, i.e., DMS. Plus, you have to rely on your readers introducing manually the data into their mapping provider of choice, which could be more hassle than it’s worth.

  3. Use a geo microformat to mark up your location data. You still have to obtain geographic coordinates from somewhere, plus you have to insert appropriate HTML to compose a valid geo microformat. On the other side, your readers can benefit from having the microformat automatically parsed for them and available on their mapping tool of choice (just be sure to direct them towards the Operator Firefox extension, for instance).

There is a lot of work to be done in order to geoenable your content. For one, manually composing a microformat, even a simple one like geo, is a hassle (this example was lifted straight from microformats.org wiki):

N 37° 24.491 W 122° 08.313

Then there is the issue of extracting latitude and longitude for your location-based content. This process, termed , can be automated with a high reliability level. Furthermore, there are several APIs available as services all over the Net. This begs the question: is it possible to tie up every piece together, geocoding and microformat composition? A tool to close the geo microformat lifecycle would be a boon to a wide geo microformat adoption.

Now you know what I am about to release in the next post…

Script updates

UnitFormat and InsertUnit, the sibling scripts from the series Greasemonkey and Microformats have been updated to reflect a little addition to the underlying measurement unit microformat: now, null units are supported (with unit name null) when converting among bases or from/to roman numerals.

This is only a convenience measure to account for the fact that Google Calculator does not understand queries with a source unit when the target one is a number base or ‘roman’: the expected

255 decimal in hex

does not work, and instead

255 in hex

must be used. The measurement unit microformat accomodates this quirk by adding a null unit that can be used as source whenever such a conversion should take place.

The updated dialog of InsertUnit looks now like this:

Of course, nothing forces you not to declare yourself a purist and specify your source units as decimal, hex, octal, binary or even roman, but for now UnitFormat, the other side of the coin, won’t properly handle those values (that’s it, by ignoring them). I’ll post an update as soon as this issue is properly handled. See you!

Update (20070816163700) It’s done now.

Greasemonkey and microformats (5)

Leaving so soon? I was hoping to interest you on yet another release of my famous series Greasemonkey and Microformats! No, wait! Please, oh please, just a little more…

Well, for the few of you still there, here I am again at it. Today, let’s talk about something I fancy calling microformat lifecycle. It’s easy enough: you and a dozen more geeks gather around a table to despise all committee-designed monstrosities of the world and come up with a shiny new microformat. After 34 meetings. All is well and parsers flourish, but for the newfangled microformat to gain any traction, there must be some content producers slapping it into their, well, whatever is that they might be producing. Or, as Web 2.0 3.0 advocates would rather say, ‘semantically enhancing their creations’.

The key to microformats adoption would be then, to bootstrap an healthy ecosystem of both producers and consumers. Live examples can be found for the best example in established microformats: hCard. A vehicle for contact details, hCard has seen quite an adoption for several reasons:

  • It’s a semantic HTML implementation of an already established standard: RFC 2426.
  • There are readily available consumers for it. The Operator Firefox extension, which makes hCard embedded data available as vCards readable by most contact management applications, among other interesting features.
  • There are several hCard producers, the simplest one being hCard Creator. A wide sampling of the hCard ecosystem can be found in the hCard implementations page.

A main concern of microformat supporters ought to be helping the non-HTML literate masses of the world to add semantic information to their markup. But there’s a catch-22 in there, as ‘non-HTML literate masses’, as per their definition, have at best a hazy understanding of markup, and well intended pages like this hCard authoring guide aren’t to be of much help but for the most dedicated of hobbyists (well, us geeks can do with just a formal hCard description, XMDP-way, can’t we?)

As a matter of example, I’m putting my money where my mouth is and release a Greasemonkey script that adds measurement unit microformat support to Blogger‘s new post (and edit post) pages. It’s creatively called InsertUnit, and after installation (just point your Greasemonkey enabled Firefox browser to the previous link, thank you) will add a button to the Edit Html editor tab just for your measurement unit microformatting pleasure. To illustrate its workflow here are some screenshots. Who doesn’t like screenshots?

  1. First of all, select some text to denote as a measurement unit. You may skip this step and just press the damn button, already.
  2. That’s it: press the uF button. I should have come up with a better glyph, but that one (as ‘microformat’ abbreviation) is way economical, byte-wise (and not an image, if I may add).
  3. Here you can see the microformat description dialog. Any text selected in step 1 will appear into the corresponding text box, saving you some keystrokes and thus helping you avoid those dreaded RSIs.
  4. After filling the rest of the text boxes (you’ll need a scalar value for the measurement, a name for the unit —to Google’s liking— and a name for a suggested target unit —same way) you may click on the Test conversion link to do just that. If an error shows up, adjust your unit names until you get the desired result. There is no documentation on expected unit names, but Google did much work to keep it a matter of common sense.
  5. And here is the result after tapping on the OK button, in all its semantic HTML glory. Nifty, isn’t it?

Before calling it quits for the day, two remarks: you are not forced to specify a target unit name (but you should state at least a value and a unit name; the OK button doesn’t enable itself until after you’ve done so). You might be interested in marking up your units, but not keen on any particular conversion. InsertUnit is a good match with UnitFormat (as described in the third part of this post series), but by no means is a required match. You might come up with a better parser, for instance (hint, hint) one that is able to suggest target units —out of the blue, or reading from some preferences file.

Last one: now you can get this script and its natural fit from userscripts.org, here: UnitFormat (userscripts.org) and here: InsertUnit (userscripts.org).

Greasemonkey and microformats (4)

Once more unto the breach, dear friends! In this third fourth (already?) installment of Microformats and Greasemonkey, we’ll wonder in sheer amazement at the unifying power of our two leading subjects when carelessly wielded against mounds of unstructured ignorance. How’s that for a beginning?

The previous post ended abruptly in a cliffhanger-wannabe fashion, suggesting great things to come from the UnitFormat Greasemonkey script. While it won’t regrow your lost hair or help you make fame and fortune, there are pesky measurement unit conversion tasks it can handle for you. Let’s throw an example.

Suppose you are publishing information about your wonderful hybrid car fuel consumption habits. You should know for a fact that the usual way to do that differs fundamentally between the UK (and the USA, for that matter) and the rest of the world. I may be thinking litres per 100 kilometres, while you are used to miles per gallon. Being a semantically conscious blogger, you may annotate your data this way:

My car does usually 
<abbr title="48" class="unit mi/gal l/100km">48 miles per gallon</abbr>

Live, it would look this way: 48 miles per gallon. With UnitFormat installed, you’d see this:

48 miles per gallon*

Note: if you have already installed UnitFormat, you ought to see two asterisks. That’s all right.

This is the generated HTML code. Note that it’s in the form of another measurement unit microformat, so screen readers and other tools may have a share of the benefits: it lacks a target unit, nevertheless.

<abbr title="48" class="unit mi/gal l/100km">48 miles per gallon</abbr><sup><abbr title="4.90030384" class="unit l/100km"><a title="4.90030384 l/100km">*</a></abbr></sup>

This is a tad more difficult than just converting feet to centimetres, but by now I’d bet you’ve already realised something far more interesting. Second example:

I paid <abbr title="23220" class="unit USD EUR">$23,200</abbr> for it.

Guess what? Google Calc handles the conversion all right:

I paid $23,220* for it.

Online currency conversions at your fingertips! But wait, there’s more. Third example:

I'm expecting a lifetime average cost of 
<abbr title="0.45" class="unit USD/mi EUR/km">45 cents per mile</abbr>.

This winds up as:

I’m expecting a lifetime average cost of 45 cents per mile*.

Complex units with variable conversion factors! Google Calc let’s you specify nearly anything, but you should stay with the simplest unit names (following standards and common sense whenever possible). Composite units should not have spaces in their names (but /, * and ^ are explicitly allowed, so units like m/s^2 are possible). Aren’t microformats powerful, and isn’t life great?

Greasemonkey and microformats (3)

Welcome to the third installment in this series about microformats and Greasemonkey! In the last episode, after failing spectacularly at providing a not too long rant on the subject at hand (and even omitting completely the expected Greasemonkey bit), I boldly introduced a barebones measurement unit microformat proposal. Today, after deftly inserting another split infinitive just for the sake of it, I’ll tweak it a bit and leave it running full speed on a collision course with an unsuspecting Greasemonkey script. Let’s see what happens!

A microformat shouldn’t be a solution desperately in need of a problem (well, I suppose that could be said of any half-successful technology out there); our tiny measurement unit microformat is not. I, for a fact, coming from a southwestern-european corner without any significant metrology history before the 1800s, have a pretty tough time grokking those strange “Imperial” or “customary” units with a glorious history of intellectual conquest. Intellectual indeed should be, to come up in a snap with how many inches to the mile there are (and just which mile, to be sure?). That mental exercising must do wonders for intelligence. But what has technology in store for me and other zero-carriers and comma-displacers of the world?

In a more serious mood, it would be quite an advantage to have our browsers displaying automatic conversions for units. Microformats are the right tool for the job, without doubts. Perhaps my proposal for a measurement unit microformat won’t quite cut it, but as simple implementations go, it’s quite powerful —with a single modification. Let’s add a target unit to the class attribute, and let’s illustrate it with a verse taken from the classic hit song Route 66:

[...] More than <abbr class="unit mi km" title="2000">two thousand miles</abbr> all the way [...]

Just how long is that? Well, I for one know that a (statute) mile is somewhat around 1.6 kilometres long. The fine author (not Bobby Troup, the hypothetical microformat author) has provided an additional unit name to the class attribute of <abbr>:

unit [source_unit] [target_unit]

Mental calculation ability aside, wouldn’t be nice to have Firefox render that code snippet as:

[…] More than two thousand miles* all the way […]

Here is a simple Greasemonkey script (please, refer to the nice documentation on how to install it) able to do just that: parse a web page for measurement units and provide an alternate display for them, as specified by the page author, in a non-intrusive way.

// ==UserScript==
// @name           UnitFormat
// @namespace      http://brucknerite.net
// @description    Unit microformat processing and conversion.
// ==/UserScript==

// License:
//
// Copyright (c) 2007 Ivan Rivera
//
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
// 
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
// 
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
// EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
// OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
// NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
// HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
// WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
// FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
// OTHER DEALINGS IN THE SOFTWARE.

// Select all <abbr> elements with an "unit" class using XPath
var allUnits = document.evaluate(
    '//abbr[contains(@class,"unit")]',
    document,
    null,
    XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
    null);
// Send everyone of them off to Google for conversion
for (var i = 0; i < allUnits.snapshotLength; i++) {
    queryGoogle(allUnits.snapshotItem(i));
}

/**
 * Queries Google via xmlHttpRequest for unit conversions. Origin and 
 * destination unit are specified in the class attribute of <abbr> elements 
 * with the following syntax:
 *      class="unit [origin] [destination]"
 * where origin and destination are mutually interchangeable units in a format 
 * acceptable to Google Calc (whichever that means). The response is handled by 
 * processResponseData(unitElem, html, altUnit). Any kind of error (parsing,
 * connectivity or otherwise) should result in the function returning without
 * side effects.
 *
 * @param unitElem  <abbr> DOM element representing a unit microformat.
 */
function queryGoogle(unitElem) {
    var value = unitElem.getAttribute('title');
    var classes = unitElem.getAttribute('class').split(' ');
    if (classes.length < 3) {
        return;
    }
    var unit = classes[1];
    var alt = classes[2];
    GM_xmlhttpRequest({
        method: 'GET',
        url: 'http://www.google.com/search?q=' + value + '+' + escape(unit) + 
            '+in+' + escape(alt),
        headers: {
            'User-agent': 'Mozilla/4.0 (compatible) Greasemonkey'
        },
        onload: function(response) {
            processResponseData(unitElem, response.responseText, alt);
        }
    });
}

/**
 * Parses Google Calc's responses and creates an element containing the 
 * conversion in the form of an asterisk adjacent to the original unit
 * microformat. The generated HTML is unit-microformat compliant in itself,
 * but contains no target unit:
 *      <abbr title="[destination_unit_value]" class="unit [destination_unit]">
 *          <a title="[destination_unit_value] [destination_unit]">*</a>
 *      </abbr>
 *
 * @param unitElem  <abbr> DOM element representing a unit microformat.
 * @param html      HTML text with Google Calc's response to process.
 * @param altUnit   Destination unit name.
 */
function processResponseData(unitElem, html, altUnit) {
    var re = new RegExp(' = .+</b></td>', 'g');
    var matches = html.match(re);
    if (matches && matches.length > 0) {
        var value = parseHtmlNumber(matches[0]);
        var resultWithUnit = matches[0].split('</b>')[0];
        var badge = document.createElement('sup');
        badge.innerHTML = '<abbr title="' + value + '" class="unit ' + altUnit +
            '"><a title="' + value + ' ' + altUnit + '">*</a></abbr>';
        unitElem.parentNode.insertBefore(badge, unitElem.nextSibling);
    }
}

/**
 * Takes a number expressed as HTML text and returns the corresponding floating 
 * point number Parsing assumes a single number, maybe with presentational HTML 
 * interspersed, perhaps containing some power of ten. Not expecting more than 
 * one separator sign (decimal, maybe "," or ".").
 *
 * @param html  HTML text containing a number.
 * @return      Parsed floating point number.
 */
function parseHtmlNumber(html) {
    var mantissa = 0;
    var exponent = 0;
    // Let's deal with the exponent first
    if (html.indexOf('<sup>') > -1) {
        supSplit = html.split('<sup>');
        exponent = parseInt(supSplit[1]);
        html = supSplit[0];
        if (isNaN(exponent)) {
            exponent = 0;
        }
    }
    // Follow up with the mantissa
    html = html.replace(/<[^<>]+> ?/g, '')
        .replace(/[^0-9,.-]/g, ' ')
        .replace(/,/g, '.');
    mantissa = parseFloat(html);
    if (isNaN(mantissa)) {
        mantissa = 0;
    }
    return mantissa * Math.pow(10, exponent);
}

This link to the UnitFormat script for your convenience.

The code is rather lousy on the edges, chiefly around parseHtmlNumber(html) function (regexes, yuck!), but seems to work for plenty of cases. A particular use for this dawned on me a millisecond after testing the first version of the script. Can you spot it? Until next time!

Greasemonkey and microformats (2)

Welcome back! Today I am going to rant —briefly, I promise— about microformats and try to cobble together a simple one to denote measurement units. So there!

As I said in the previous post, microformats are small pieces of self-significant HTML that may be used to embed automatically parseable information in a web page. They are just POSH, but not in the fake Port Out, Starboard Home way: any given microformat is a subset of HTML, and a significant subset (in a semantic way), at that! Let’s try to illustrate that with the simplest of microformats, rel-tag.

This microformat is so tiny it might be called a nanoformat instead. It just consists on one HTML attribute, the little known rel. Valid for links (<a> and <link> tags), rel describes the relationship of the current document to the anchor specified in the href attribute of the tag. There is a bunch of suggested possible values in Section 6.12 of the HTML 4.01 specification, and all rel-tag does, syntax-wise, is adding a value to that list: tag.

The general idea behind rel-tag is providing robots with an easy way to tag content, therefore adding some much needed common sense to searches. The tag itself can come from a variety of tag spaces, one of the most evident ones being Wikipedia. Here is a tag example (extracted from this very blog) using a self defined namespace:

<a rel="tag" href="http://brucknerite.net/search/label/javascript">javascript</a>

In the process of incubating a microformat, the first thing to do is to resist any urges at design for design’s sake. These wise words notwithstanding, I’d like to tell you of this little idea of mine: what about a microformat for measurement units? I don’t want to cheat on anybody by asserting it hasn’t been proposed before, because it has. Trouble is, discussion stopped on it without arriving at a significant consensus several months ago (more than eight, less than ten). I believe there is a real world problem to solve, and not much done in the form of in-the-wild implementations. If anything is about to gain any traction, it should be simple. Dead simple. What about this?

<abbr class="unit EUR" title="1320">1&thinsp;320&nbsp;&euro;</abbr>

It’s just a simple application of two recommended design patterns: class and abbr. The former one seems to be pretty well established, however the latter has some controversy behind it. But I wouldn’t mind having a <span> element there, to be honest. An explanation could be of some help at this point.

The <abbr> element allows, by means of its title attribute, to provide a machine parseable value for the eminently presentational string inside. For (non-vision impaired) humans, the string appears as “1 320 €”: the contents of title, being a straight string representation of the value, allows robots to skip the complexities of the human mind (and, perhaps, to read the number aloud correctly).

The class attribute provides discerning parsers with two fundamental pieces of semantic information: it’s an unit, and the unit is a currency, namely euros. As a valid HTML class literal can be nearly anything (spaces and other small quirks excluded), I’d stay with unit names acceptable to one of the more popular unit name parsers out in the Net: Google Calculator. The rationale of this proposal, and the Greasemonkey bit, to be explained in a further article. See you!