Fragment Search

Fragment Search is a Greasemonkey script for Firefox which allows people to create URLs which link to content within a page without having control over that page.

Fragment identifiers in HTTP URIs have been historically used to link to a name or ID within a document’s markup. For example, http://www.mozilla.org/MPL/relicensing-faq.html#why-relicensing finds the HTML construct <a name="why-relicensing"> in the page in question.

This is historically true, although RFC 1738, the URL RFC, doesn’t actually seem to mandate it. Fragment identifiers are specced at a high level in RFC 3986, but that doesn’t say what they should be used for either. (If you know where this is specced, please let me know.)

But what do you do if the document doesn’t have a name or ID where you want to link, and you don’t have control of the document (as is the case for most people and most documents)? My suggestion is to extend the syntax of the fragment identifier using a character which is illegal in names/IDs (“!”), and which conforming browsers can use to trigger a textual search of the page contents. So, for example, http://www.gerv.net/#!s!design searches for the word “design” on the front page of gerv.net.

Clearly, this is just a proof of concept. It needs to be built into browsers to be generally useful. But please do try it out and let me know what you think.

Read more, install the script or extension, and try it out

13 thoughts on “Fragment Search

  1. I’ve always wanted something like this, but my idea of using XPaths would have been a bit difficult for the average user :)

    It definitely would be useful (especially for conversations where the content of a page won’t change significantly between creation of the link and when it is followed…)

  2. I also thought about something like this. My thinking was of an extension that could:
    a) Give the user some way of identifying IDs and names on the document (a little icon beside the text that could be right-clicked and copied)
    b) As Gerv is describing, give them a way of linking to places without IDs or names. I also thought to use XPaths, which aren’t user-unfriendly if the extension could generate them itself.

  3. The obvious problem here is that search for a text might give multiple results and hence will not yield a unique anchor.

    An XPath approach doesn’t have that problem and, as Jason pointed above, if it is handled by the extension – it doesn’t have to be user-unfriendly – e.g. highlight text and right-click to select “Copy direct link to this fragment”

    But representing XPath (which can be seriously long) as a URL suffix might be a problem. One idea of the top of the head – could be to use a combination — suffix the search text (as in the current script) followed by by a hash of the (normalized) XPath.

    That way, the corresponding extension on the receiving-end can zero in on the exact location from within the search hits without the need to convey the complete XPath as part of the URL (at the cost of some processing power of course :| )

  4. Ah well – just saw some of your examples. Suffixing search number works just as well of course. My bad for not looking all examples in detail first.

    Apologies….

  5. I must admit to having wished for this kind of capability myself.

    One improvement I might propose is the syntax #!s:options!string where the options are optional and the colon may be omitted if there are no options. Options could be used for things like case (in)sensitivity, or, perhaps more usefully, linking to the nth match if you specify a number as an option.

  6. Upon closer inspection, I see that you’ve already made provision for the nth match thing.

    The fingerprinting proposal also seems very good: although it wouldn’t be nearly as useful as the fragment search linking, it also wouldn’t get in the way of anything. And it would provide some real value: I must admit that I basically never bother to check downloaded files against published MD5 sums, not because I can’t figure out how, but because it’s currently several extra steps per file, and I’m just not quite that paranoid. I just assume they match. But if the browser (or, more likely, wget) could check automatically, I’d certainly let it do so, and if it told me that a file didn’t match its published sum, I’d be glad it had checked.

    So both proposals seem useful, but the search/match link seems *more* useful than the fingerprinting, from the perspective of a power user and web developer, because fundamentally it adds a capability that otherwise doesn’t exist; whereas, the fingerprinting just makes something more conveniently automatic.

  7. XPath is a possibility; however, I think that leveraging the browser’s “Find” capability is more understandable to people, and makes for much easier hand-creation of URLs. Yes, you can have programmatic support for creation of XPaths, but I think this proposal solves the 95% case while keeping it simple.

    Jonadab: We could have an extensible options syntax. I’m trying to keep things as simple as possible, though; the downside is that you would need to type something like #!s:m=1! rather than #!s1!. Also, the more letters, words or names we put into it, the more English is becomes. And that’s not i18n friendly. That’s one reason why I picked “s” rather than “search”. People for whom the word “search” doesn’t begin with “s” at least only have to remember one letter.

    While us techies like the idea of exposing all the options, and the completeness of “case-insensitive search”, I really don’t think it would get used in practice, for the fundamental reason that the person who is constructing the link knows exactly what they are linking to. It wouldn’t make the links more robust, either – how often does a document change only in the case of the letters?

  8. fantastic.

    So many documents are missing not only id info, but even div or p tags to separate content.

    It will be great if this kind of linking gets widespread adoption among browsers.

  9. Toufeeq: thanks for the tip. WebMarker is interesting, but their marks do have an (admittedly small) risk of clashing with existing anchors in the page. Perhaps I can persuade them to adopt my syntax.