January 1, 2005

Translating HTML / XML documents?

Basically, I'm wondering how it's done, besides on-the-fly translations like those provided by babelfish.altavista.com or Google's language tools. No automated tool is as good as a human expert in the languages involved.

I'm not asking so much for an actual tool as an idea of how a UI for a manual translation tool might be designed. Basically, I'd want to have some (preferably XUL) UI that would guide the developer using it through a translation process.

This relates to my Abacus project, so my personal interest in this is in translating presentation MathML.

Posted by WeirdAl at January 1, 2005 11:19 PM
Comments

I am using xml2po and poedit to translate XML documents (DocBook). xml2po has built-in-support for XHTML and DocBook, but it should be possible to extend it to support MathML.

See this post for details: http://weblogs.goshaky.com/weblogs/page/lars/20040823#translating_docbook_documents

Posted by: Lars Trieloff at January 2, 2005 4:38 AM

Before drawing the UI spec, you need to know the design requirements. A few things I think computer aided translation software should have:
1. An easy way to track how terms and headings are translated.
2. Ability to see the original text
3. Outline mode, ability to quickly jump between sections
4. Customisable dictionaries/thesaurus

(From Alex: Thanks. I was rather hoping you knew of tools out there we could safely emulate, at least in the UI.)

Posted by: Daniel Wang at January 2, 2005 5:45 AM

"I'm not asking so much for an actual tool as an idea of how a UI for a manual translation tool might be designed."

Such domain-specific tools already exist in the closed-source world. I have a friend who is a translator, and she has bought software which helps her using various dictionary/autocomplete features.

If you want to design an open source document translation tool (why would it need to be browser-based?) then it might be good to find someone with an existing tool of that sort and look for inspiration.

Posted by: Gerv at January 3, 2005 3:09 AM