| [INFO] | Channel view for “#developers” opened. |
| -->| | YOU (WeirdAl) have joined #developers |
| =-= | Topic for #developers is “Try http://landfill.mozilla.org/mxr-test || <Pike> "support" is such a strong word || editor is the new parser” |
| =-= | Topic for #developers was set by bz on 05/04/06 20:58:42 |
| =-= | rhelmer-afk is now known as rhelmer-zzz |
| -->| | Hendy (wolfox@moz-A208670A.nsw.bigpond.net.au) has joined #developers |
| |<-- | Hendikins has left moznet (NickServ (GHOST command used by Hendy)) |
| =-= | Hendy is now known as Hendikins |
| <db48x> | WeirdAl: the commentor on your site was maybe a little rude, but syntax highlighting is _hard_ |
| <WeirdAl> | I know it's not easy. If it were easy, they'd've done it for MozAppSuite 1.0. |
| <WeirdAl> | we're still over a year from Gecko 1.9, there's time if we start now |
| <db48x> | WeirdAl:
the only way to do it correctly it so have a full parser for the
language you're building that gives each part of the text a name, then
associate the names with colros |
| <db48x> | colors |
| <db48x> | the last part is very easy, of course |
| * WeirdAl | shakes his head |
| <WeirdAl> | one per language is a bit infeasible |
| <WeirdAl> | that's why I suggested regular expressions. |
| <db48x> | so xml syntax highlighting shouldn't be too difficult, we do after all have an xml parser handy |
| <db48x> | regular expressions will never work |
| <WeirdAl> | never? |
| <db48x> | never |
| <db48x> | I'll show you an example |
| <WeirdAl> | careful, I'm not really awake right now :) |
| -->| | Standard8 (mark@moz-BD7E5647.demon.co.uk) has joined #developers |
| * WeirdAl | waits for the example |
| >db48x< | I have an idea on how it might be possible with what we have now, except for one detail |
| <db48x> | hmm, I had a really good example on mxr-test, but now I don't recall which file it was |
| <WeirdAl> | mind if I talk with you off this channel? |
| <db48x> | oops: "Unexpected returnvalue 2 from Glimpse" |
| <db48x> | anyway, the point is that there will always be problems unless you have a tokeniser/lexer for the language |
| <db48x> | but for xml languages we do have that |
| -->| | brosnan (brosnan@moz-738FF14C.ri.ri.cox.net) has joined #developers |
| <WeirdAl> | exactly: we have it for XML, we have it for JS (sort of), we have it for CSS (sort of), we have it for HTML |
| <WeirdAl> | these are the main types I'm thinking about |
| <WeirdAl> | I'm not talking Ruby here |
| <db48x> | yea,
as long as we have a parser for it, and the parser keeps track of what
source text each item in it's parse tree coresponds to, you're golden |
| <WeirdAl> | keeping track, that's the hard part |
| =-= | rhelmer-zzz is now known as rhelmer |
| <WeirdAl> | that's the part I have no idea how to do |
| <db48x> | no, the parser has that information as it goes through the document to create the dom nodes, or css style rules or whatever |
| <db48x> | it just has to record it |
| <WeirdAl> | except maybe through DOM ranges |
| <WeirdAl> | :-/ that means memory, and memory for a program isn't cheap |
| <db48x> | and
you want to be able to turn that recording on and off, so you only do
it for documents you're editing, because it would increase Tp otherwise |
| |<-- | auswerk has left moznet (Quit: Trillian (http://www.ceruleanstudios.com) |
| <WeirdAl> | right |
| <db48x> | eh, two numbers per dom node is no big deal |
| <WeirdAl> | hehe |
| <WeirdAl> | but we deal in hundreds, thousands of DOM nodes |
| <db48x> | so? |
| <db48x> | don't worry about that |
| <WeirdAl> | a billion here, a billion there, soon you're talking real money |
| <WeirdAl> | anyway |
| <db48x> | you can't worry about that before you actually implement something |
| <WeirdAl> | so for language-specific issues, it's no big deal. Why is it hard/impossible for a regexp-based system? |
| <WeirdAl> | if the node locations corresponding to source text are already known |
| <db48x> | because
until your set of regular expressions is complete, it won't highlight
the text correctly in all situations. once it's complete it's
equivalent to a full parser for the language |
| <db48x> | and it's also unmaintainable |
| <WeirdAl> | *sigh* |
| =-= | Mossop is now known as Mossop_Away |
| <WeirdAl> | That's like Godel's incompleteness theorem. It doesn't have to be perfect, for crying out loud |
| <db48x> | if it's not perfect it makes you less productive |
| <WeirdAl> | believe me, I'll be providing tools for devs to adjust the set of reg exp's |
| <db48x> | or I should say it makes you less productive whenever it's wrong |
| <WeirdAl> | or whatever syntax highlighting guidelines there are |
| <db48x> | WeirdAl: have you ever seen the regular expression that parses email addresses? |
| <sp3000> | email addresses aren't even recursive ;) |
| <WeirdAl> | I think I've written a couple variants for JS, about five or six years ago |
| <db48x> | http://ex-parrot.com/~pdw/Mail-RFC822-Address.html |
| * WeirdAl | blinks |
| <db48x> | there's a simplified one that only works if you've already stripped out comments |
| * WeirdAl | pauses, collecting his thoughts |
| <WeirdAl> | (congrats, you really knocked me off my pedestal there - I'm having to regroup) |
| <WeirdAl> | --
my instinctive reaction is to say that I'm not above having a two-phase
regular expression evaluation, but that completely defeats both my
original proposal and simple design/editing of the syntax highlighting |
| <WeirdAl> | db48x: I am completely, completely open to any method of syntax highlighting, with a preference on the ability to customize. |
| <WeirdAl> | -- that is, someone who's got an itch to scratch should be able to reach it |
| <db48x> | WeirdAl:
here's what you have to do. modify the parser (just start with xml,
it'll be easiest) so that you can tell it to record character offsets
for each tag |
| <db48x> | and attribute, etc |
| <WeirdAl> | then figure out an on/off switch, defaulting to off? :) |
| <db48x> | once you have it doing that, you can use that data to build a second dom tree with nodes like <span class="tag"><window></span> |
| <db48x> | each language has a different set of classes |
| <WeirdAl> | I follow you so far |
| <db48x> | for xml you've got things like tag, attribute, attribute value, cdata delimiter, cdata data, etc |
| <WeirdAl> | (and I hope you put this in my blog) |
| <db48x> | then you just have a css stylesheet that determines what color/font sytle etc goes with what class |
| <WeirdAl> | db48x: there's just one little catch. |
| <WeirdAl> | what you're talking about is /static/ |
| <db48x> | how so? you can edit a stylesheet |
| <WeirdAl> | no, that's not what I mean |
| <db48x> | you can reparse a document |
| <db48x> | you have to reparse the document anyway as it gets edited |
| <WeirdAl> | reparsing is something I have thought of, but it seems like overkill |
| <WeirdAl> | (aside from the possibility that plaintext editor already reparses on every input) |
| <db48x> | currently compose reparses whenever you go from the source view to any of the rendered views |
| <db48x> | and all editing commands just mutate the existing document |
| <WeirdAl> | I'm talking about within a <xul:editor/>, by keystroke |
| <db48x> | but it wouldn't be hard to calculate the new offset data when mutating the document |
| <WeirdAl> | I really am not awake right now. You're saying good stuff, the right stuff, but I'm having trouble grokking it at 2:40 am |
| <db48x> | actually, you wouldn't even have to do that |
| <db48x> | once you have a highlighted plain text view of the source for the user to edit |
| <db48x> | a lot of the simple editing doesn't change that highlighting |
| <WeirdAl> | ... I don't suppose I could talk you into drawing up a guideline or spec for me to follow? :) |
| <db48x> | just read the log of this tommorrow |
| <WeirdAl> | it sounds like a lot, from my standpoint |
| <db48x> | hmm, for xml at least, only adding or removing nodes changes the highlighting |
| <db48x> | WeirdAl:
well, look at it this way. if you have a collection of regular
expressions that determine the highlighting, you have to run those
expressions over the document each time the user types something, right? |
| <WeirdAl> | at least over the portion where they're writing |
| <db48x> | WeirdAl: no, you have to do it over the whole document |
| <WeirdAl> | that said, I was planning to allow for a two-second delay between typing and redrawing |
| <db48x> | otherwise you run into syncing issues |
| <db48x> | like, lets say you only run it over the part that's within 5 lines either way of the cursor |
| <db48x> | but their editing a long string that covers multiple lines |
| <WeirdAl> | and we get a regexp greater than five lines, I know |
| <WeirdAl> | that too |
| <db48x> | if the beginning of the string goes out of that range, it'll start highlighting the text as if it were code |
| <db48x> | parsing a document doesn't actually take that long |
| <WeirdAl> | I suspect JS-based data won't cut it |
| <db48x> | unless it's really big |
| -->| | Peter6 (Peter6@moz-397FC4D5.speed.planet.nl) has joined #developers |
| <db48x> | so
if you want, you can just measure the time it takes to hightlight the
document the first time, and base future actions off of that |
| <WeirdAl> | hm, yeah |
| =-= | Peter6 is now known as Peter6_away |
| <db48x> | so
if they're editing some enormous file and it takes a minute to parse
and highlight, you probably don't want to do it on every keystroke |
| <db48x> | in fact, at some point an infobar saying "this document is so big that I'm not even going to try to highlight it" would be ncie |
| <WeirdAl> | db48x:
I've been thinking I want to use xpcom to feed the parameters in for
highlighting, and then (within a C++-based component) cache the entries
so we don't go back to xpcom for the data repeatedly |
| <db48x> | nice |
| <db48x> | WeirdAl: for an xml document you should store that info on the dom tree |
| <db48x> | a similar structure is available for css, I'm sure |
| <db48x> | I don't know how accessable the parse tree for js is though |
| <db48x> | that's a question for brendan |
| <WeirdAl> | js would be a bear |
| <WeirdAl> | as would C++, but I'm not ready to tackle that yet :) |
| <db48x> | WeirdAl:
well, the cool thing is that once you have this working for xml, you
can get it working for the other languages pretty easily |
| <WeirdAl> | I know. |
| <db48x> | the part that generates the highlighted document is the same for all of them |
| <WeirdAl> | Right now I'm trying to imagine an API or set of IDL's for this, and I'm drawing a complete blank. |
| <db48x> | so that's cool |
| <WeirdAl> | - usually, when I can do the IDL's, that means I've got a good understanding of the basics :) |
| <db48x> | I'd start by writing a highlighter that took offset data from a dom tree |
| <db48x> | you can do that in js |
| <WeirdAl> | I'd rather do it in C++, just to minimize perf issues. I need more experience coding in C++ anyway. |
| <db48x> | once
you've done that you'll know whether you want the css data to be
presented as a tree in the same fashion, or if you want to have some
other datastructure that they'll all use |
| <db48x> | sure, if you want |
| <db48x> | but I'm sorta talking about building one that you might throw away |
| <WeirdAl> | hehe |
| <db48x> | just to take a first lick at it and see how it looks once you've done tht |
| <db48x> | hat |
| <db48x> | that |
| <WeirdAl> | well, what can be written in JS can be written in C++, and I know JS pretty darned well |
| <db48x> | exactly |
| <db48x> | you might even find that the js version is pretty darn fast already |
| <WeirdAl> | probably. Everyone "knows" JS is 10x slower than C++, but only as a generalization that probably isn't true anymore |
| <db48x> | yea |
| <db48x> | it's only slow if you're modifying the dom and displaying it a bunch of times per second |
| <WeirdAl> | okay, I'm going to save what we've got here, and I'll post it to my blog as notes |
| <db48x> | like dhtml animations |
| <WeirdAl> | whoa, wait a sec... I thought that's what reparsing on every keystroke was: modifying & displaying repeatedly |
| <db48x> | yea, but not really |
| * WeirdAl | raises a skeptical eyebrow |
| <WeirdAl> | oh, what the hell, plaintext editor is so fast that I barely notice it anyway |
| <WeirdAl> | if I notice it at all |
| =-= | rhelmer is now known as rhelmer-zzz |
| <db48x> | I mean, you're not looking for a 60 fps here |
| * db48x | inserts 'game' in there somewhere |
| <WeirdAl> | heh, no, humans can't type that fast |
| <db48x> | I can |
| <db48x> | sometiems |
| <WeirdAl> | 60 per second, 60 seconds per minute? That's 3600 WPM |
| <db48x> | but you're still not likely to be doing this once per second |
| <db48x> | oh, yea. it usually is measured in wpm |
| <WeirdAl> | oops, 3600/5 |
| <WeirdAl> | 5 chars to the word |
| <db48x> | I was thinking 60? yea, I can do 60 wpm |