July 25, 2006

Desymbolizer

A friend of mine is learning Greek before he goes to theological college, and is getting in a mess over fonts. He keeps finding that if he copies and pastes some Greek, it magically turns into transliterated Latin letters. Guess what? It's the Symbol font problem again.

It's 4.30am and I'm awake and jet-lagged so, to help him out, and anyone else who doesn't have Symbol installed but may want to read some text written to require it, I've written the Desymbolizer. You paste some text designed for Symbol into the text box, and it gives it back to you as the correct Unicode codepoints (encoded as HTML numeric entities).

You can try it out with text from greekbible.com.

Posted by gerv at July 25, 2006 1:43 PM
Comments

If your friend is planning on doing any Greek scholarship along the way then it is probably wise to head over to the B-Greek discussion group and learn their transliteration scheme.

It's in their FAQ: http://www.ibiblio.org/bgreek/faq.txt

There are also some standardized Greek fonts for use in publication to journals. However, it depends on the journal as to which font they use for their Greek transcription.

Yours,
Matt

Posted by: Matt Lemieux at July 25, 2006 3:28 PM

Why entity codes and not the direct UTF-8 characters? Or am I missing something (which given my patchy knowledge of Unicode is entirely possible)?

Posted by: Robin at July 25, 2006 3:54 PM

No particular reason - only that I didn't want to think about how to make it possible to have a ISO-8859-1 form in a UTF-8 page (which it would have to be to use raw characters).

Posted by: Gerv at July 25, 2006 4:30 PM

I don't think this is what you're mate is after, but it's worth knowing about anyway: this is a nifty (Word) font converter that turns Hebrew or Greek or Syriac from a number of the main "legacy" true-type fonts and converts them to the Unicode equivalents. I've done some pretty demanding stuff with them, it works very well. (Ususual disclaimers, YMMV, etc.)

Posted by: David at July 25, 2006 8:16 PM

I wrote a bookmarklet to do much the same as this a few years ago. I don't think I still have it, but it wouldn't be hard to reconstruct. The advantage is that with one click it converts all the Symbol font text on a page to Unicode in place, and from then on you can copy and paste directly from the page without going through an intermediate stage.

Posted by: Simon Montagu at July 26, 2006 8:41 AM

Looks like somebody else had the same idea as me: http://everything2.com/index.pl?node_id=1689425

Posted by: Simon Montagu at July 26, 2006 8:53 AM