Desymbolizer

A friend of mine is learning Greek before he goes to theological college, and is getting in a mess over fonts. He keeps finding that if he copies and pastes some Greek, it magically turns into transliterated Latin letters. Guess what? It’s the Symbol font problem again.

It’s 4.30am and I’m awake and jet-lagged so, to help him out, and anyone else who doesn’t have Symbol installed but may want to read some text written to require it, I’ve written the Desymbolizer. You paste some text designed for Symbol into the text box, and it gives it back to you as the correct Unicode codepoints (encoded as HTML numeric entities).

You can try it out with text from greekbible.com.

6 thoughts on “Desymbolizer

  1. If your friend is planning on doing any Greek scholarship along the way then it is probably wise to head over to the B-Greek discussion group and learn their transliteration scheme.

    It’s in their FAQ: http://www.ibiblio.org/bgreek/faq.txt

    There are also some standardized Greek fonts for use in publication to journals. However, it depends on the journal as to which font they use for their Greek transcription.

    Yours,
    Matt

  2. Why entity codes and not the direct UTF-8 characters? Or am I missing something (which given my patchy knowledge of Unicode is entirely possible)?

  3. No particular reason – only that I didn’t want to think about how to make it possible to have a ISO-8859-1 form in a UTF-8 page (which it would have to be to use raw characters).

  4. I don’t think this is what you’re mate is after, but it’s worth knowing about anyway: this is a nifty (Word) font converter that turns Hebrew or Greek or Syriac from a number of the main “legacy” true-type fonts and converts them to the Unicode equivalents. I’ve done some pretty demanding stuff with them, it works very well. (Ususual disclaimers, YMMV, etc.)

  5. I wrote a bookmarklet to do much the same as this a few years ago. I don’t think I still have it, but it wouldn’t be hard to reconstruct. The advantage is that with one click it converts all the Symbol font text on a page to Unicode in place, and from then on you can copy and paste directly from the page without going through an intermediate stage.