Have you done any of the computer science behind syntax highlighting? Have you studied context free grammars? Have you read the sources of existing parsers, tokenisers and lexers, and competent highlighting tools? Do you still think it's competent to do syntax highlighting for irregular grammars with regular expressions?
(From Alex: Whoever you are, you don't sound like a Mozilla developer. Our Mozilla guys at least sign their names.
I haven't done any of that, no, but I would appreciate links to such documents, instead of rhetorical questions that basically call me an idiot - especially the last one.)
Posted by anon at May 5, 2006 8:40 PMAlex: maybe anon was a bit harsh, but he has a point. Regexp-based parsers suck, as anyone who tried to write and maintain one knows.
You really ought to check how existing parsers and good highlighters are implemented before suggesting things like this.
Last, it seems that Daniel has been working on a decent syntax highlighter. I hope he'll share his code when it's done.
Posted by Nickolay Ponomarev at May 6, 2006 12:14 AMI think syntax highlighting based on lexical analysis is the way to go. I think this is what most other syntax highlighting editors do, except for super-sophisticated ones that actually parse and/or compile the source code on the fly, which is overkill for most applications.
Lexical analysis of a document basically means tokenization via regular expressions, plus a state machine (usually, but not necessarily, a finite state machine). Look at the flex documentation to see what lexers do.
In Mozilla the way to implement this would probably be to make the document a bunch of spans. Each span represents one lexed token, and has an attribute saying what the token is. Periodically in the document you'd store checkpoints of the lexer state at that point in the document. When the document changes, you rerun lexing from the previous checkpoint, updating the document span structure as you go. Whenever you reach a new checkpoint, if the new state is the same as the checkpointed state, stop, otherwise continue to lex the next chunk of the document.
Posted by Robert O'Callahan at May 6, 2006 1:32 AMThe tricky part with syntax highlighting is that you need to cut stuff into pieces. However that's done.
XPath selects nodes, not ranges. CSS selects elements, you can't even select individual text nodes. So however you cut those parts, you need to make DOM modifications to markup the document fragments.
I suggest reading a bit on syntax highlighting in eclipse, too.
I'm not sure what editor in source view does, and how its performance compares between rich and source mode.
Posted by Axel Hecht at May 6, 2006 2:02 AMiirc Daniel Glazman was also working on something like this?
Posted by Ian at May 8, 2006 12:30 AMisn't this already done for the view-source windows? though i suppose that could be done backwards from the already-parsed document.
(From Alex: That's done through a XSLT transformation. I'm considering this possibility too, but it's a bit painful.)
Posted by scratch at May 8, 2006 9:54 AM