URL irc://moznet/developers Mode +snr Users 119, 5@, 0%, 0+
Topic Try http://landfill.mozilla.org/mxr-test || <Pike> "support" is such a strong word || editor is the new parser
#developers
[INFO]Channel view for “#developers” opened.
-->|YOU (WeirdAl) have joined #developers
=-=Topic for #developers is “Try http://landfill.mozilla.org/mxr-test || <Pike> "support" is such a strong word || editor is the new parser”
=-=Topic for #developers was set by bz on 05/04/06 20:58:42
=-=rhelmer-afk is now known as rhelmer-zzz
-->|Hendy (wolfox@moz-A208670A.nsw.bigpond.net.au) has joined #developers
|<--Hendikins has left moznet (NickServ (GHOST command used by Hendy))
=-=Hendy is now known as Hendikins
<db48x>WeirdAl: the commentor on your site was maybe a little rude, but syntax highlighting is _hard_
<WeirdAl>I know it's not easy. If it were easy, they'd've done it for MozAppSuite 1.0.
<WeirdAl>we're still over a year from Gecko 1.9, there's time if we start now
<db48x>WeirdAl: the only way to do it correctly it so have a full parser for the language you're building that gives each part of the text a name, then associate the names with colros
<db48x>colors
<db48x>the last part is very easy, of course
* WeirdAlshakes his head
<WeirdAl>one per language is a bit infeasible
<WeirdAl>that's why I suggested regular expressions.
<db48x>so xml syntax highlighting shouldn't be too difficult, we do after all have an xml parser handy
<db48x>regular expressions will never work
<WeirdAl>never?
<db48x>never
<db48x>I'll show you an example
<WeirdAl>careful, I'm not really awake right now :)
-->|Standard8 (mark@moz-BD7E5647.demon.co.uk) has joined #developers
* WeirdAlwaits for the example
>db48x<I have an idea on how it might be possible with what we have now, except for one detail
<db48x>hmm, I had a really good example on mxr-test, but now I don't recall which file it was
<WeirdAl>mind if I talk with you off this channel?
<db48x>oops: "Unexpected returnvalue 2 from Glimpse"
<db48x>anyway, the point is that there will always be problems unless you have a tokeniser/lexer for the language
<db48x>but for xml languages we do have that
-->|brosnan (brosnan@moz-738FF14C.ri.ri.cox.net) has joined #developers
<WeirdAl>exactly: we have it for XML, we have it for JS (sort of), we have it for CSS (sort of), we have it for HTML
<WeirdAl>these are the main types I'm thinking about
<WeirdAl>I'm not talking Ruby here
<db48x>yea, as long as we have a parser for it, and the parser keeps track of what source text each item in it's parse tree coresponds to, you're golden
<WeirdAl>keeping track, that's the hard part
=-=rhelmer-zzz is now known as rhelmer
<WeirdAl>that's the part I have no idea how to do
<db48x>no, the parser has that information as it goes through the document to create the dom nodes, or css style rules or whatever
<db48x>it just has to record it
<WeirdAl>except maybe through DOM ranges
<WeirdAl>:-/ that means memory, and memory for a program isn't cheap
<db48x>and you want to be able to turn that recording on and off, so you only do it for documents you're editing, because it would increase Tp otherwise
|<--auswerk has left moznet (Quit: Trillian (http://www.ceruleanstudios.com)
<WeirdAl>right
<db48x>eh, two numbers per dom node is no big deal
<WeirdAl>hehe
<WeirdAl>but we deal in hundreds, thousands of DOM nodes
<db48x>so?
<db48x>don't worry about that
<WeirdAl>a billion here, a billion there, soon you're talking real money
<WeirdAl>anyway
<db48x>you can't worry about that before you actually implement something
<WeirdAl>so for language-specific issues, it's no big deal. Why is it hard/impossible for a regexp-based system?
<WeirdAl>if the node locations corresponding to source text are already known
<db48x>because until your set of regular expressions is complete, it won't highlight the text correctly in all situations. once it's complete it's equivalent to a full parser for the language
<db48x>and it's also unmaintainable
<WeirdAl>*sigh*
=-=Mossop is now known as Mossop_Away
<WeirdAl>That's like Godel's incompleteness theorem. It doesn't have to be perfect, for crying out loud
<db48x>if it's not perfect it makes you less productive
<WeirdAl>believe me, I'll be providing tools for devs to adjust the set of reg exp's
<db48x>or I should say it makes you less productive whenever it's wrong
<WeirdAl>or whatever syntax highlighting guidelines there are
<db48x>WeirdAl: have you ever seen the regular expression that parses email addresses?
<sp3000>email addresses aren't even recursive ;)
<WeirdAl>I think I've written a couple variants for JS, about five or six years ago
<db48x>http://ex-parrot.com/~pdw/Mail-RFC822-Address.html
* WeirdAlblinks
<db48x>there's a simplified one that only works if you've already stripped out comments
* WeirdAlpauses, collecting his thoughts
<WeirdAl>(congrats, you really knocked me off my pedestal there - I'm having to regroup)
<WeirdAl>-- my instinctive reaction is to say that I'm not above having a two-phase regular expression evaluation, but that completely defeats both my original proposal and simple design/editing of the syntax highlighting
<WeirdAl>db48x: I am completely, completely open to any method of syntax highlighting, with a preference on the ability to customize.
<WeirdAl>-- that is, someone who's got an itch to scratch should be able to reach it
<db48x>WeirdAl: here's what you have to do. modify the parser (just start with xml, it'll be easiest) so that you can tell it to record character offsets for each tag
<db48x>and attribute, etc
<WeirdAl>then figure out an on/off switch, defaulting to off? :)
<db48x>once you have it doing that, you can use that data to build a second dom tree with nodes like <span class="tag">&lt;window&gt</span>
<db48x>each language has a different set of classes
<WeirdAl>I follow you so far
<db48x>for xml you've got things like tag, attribute, attribute value, cdata delimiter, cdata data, etc
<WeirdAl>(and I hope you put this in my blog)
<db48x>then you just have a css stylesheet that determines what color/font sytle etc goes with what class
<WeirdAl>db48x: there's just one little catch.
<WeirdAl>what you're talking about is /static/
<db48x>how so? you can edit a stylesheet
<WeirdAl>no, that's not what I mean
<db48x>you can reparse a document
<db48x>you have to reparse the document anyway as it gets edited
<WeirdAl>reparsing is something I have thought of, but it seems like overkill
<WeirdAl>(aside from the possibility that plaintext editor already reparses on every input)
<db48x>currently compose reparses whenever you go from the source view to any of the rendered views
<db48x>and all editing commands just mutate the existing document
<WeirdAl>I'm talking about within a <xul:editor/>, by keystroke
<db48x>but it wouldn't be hard to calculate the new offset data when mutating the document
<WeirdAl>I really am not awake right now. You're saying good stuff, the right stuff, but I'm having trouble grokking it at 2:40 am
<db48x>actually, you wouldn't even have to do that
<db48x>once you have a highlighted plain text view of the source for the user to edit
<db48x>a lot of the simple editing doesn't change that highlighting
<WeirdAl>... I don't suppose I could talk you into drawing up a guideline or spec for me to follow? :)
<db48x>just read the log of this tommorrow
<WeirdAl>it sounds like a lot, from my standpoint
<db48x>hmm, for xml at least, only adding or removing nodes changes the highlighting
<db48x>WeirdAl: well, look at it this way. if you have a collection of regular expressions that determine the highlighting, you have to run those expressions over the document each time the user types something, right?
<WeirdAl>at least over the portion where they're writing
<db48x>WeirdAl: no, you have to do it over the whole document
<WeirdAl>that said, I was planning to allow for a two-second delay between typing and redrawing
<db48x>otherwise you run into syncing issues
<db48x>like, lets say you only run it over the part that's within 5 lines either way of the cursor
<db48x>but their editing a long string that covers multiple lines
<WeirdAl>and we get a regexp greater than five lines, I know
<WeirdAl>that too
<db48x>if the beginning of the string goes out of that range, it'll start highlighting the text as if it were code
<db48x>parsing a document doesn't actually take that long
<WeirdAl>I suspect JS-based data won't cut it
<db48x>unless it's really big
-->|Peter6 (Peter6@moz-397FC4D5.speed.planet.nl) has joined #developers
<db48x>so if you want, you can just measure the time it takes to hightlight the document the first time, and base future actions off of that
<WeirdAl>hm, yeah
=-=Peter6 is now known as Peter6_away
<db48x>so if they're editing some enormous file and it takes a minute to parse and highlight, you probably don't want to do it on every keystroke
<db48x>in fact, at some point an infobar saying "this document is so big that I'm not even going to try to highlight it" would be ncie
<WeirdAl>db48x: I've been thinking I want to use xpcom to feed the parameters in for highlighting, and then (within a C++-based component) cache the entries so we don't go back to xpcom for the data repeatedly
<db48x>nice
<db48x>WeirdAl: for an xml document you should store that info on the dom tree
<db48x>a similar structure is available for css, I'm sure
<db48x>I don't know how accessable the parse tree for js is though
<db48x>that's a question for brendan
<WeirdAl>js would be a bear
<WeirdAl>as would C++, but I'm not ready to tackle that yet :)
<db48x>WeirdAl: well, the cool thing is that once you have this working for xml, you can get it working for the other languages pretty easily
<WeirdAl>I know.
<db48x>the part that generates the highlighted document is the same for all of them
<WeirdAl>Right now I'm trying to imagine an API or set of IDL's for this, and I'm drawing a complete blank.
<db48x>so that's cool
<WeirdAl>- usually, when I can do the IDL's, that means I've got a good understanding of the basics :)
<db48x>I'd start by writing a highlighter that took offset data from a dom tree
<db48x>you can do that in js
<WeirdAl>I'd rather do it in C++, just to minimize perf issues. I need more experience coding in C++ anyway.
<db48x>once you've done that you'll know whether you want the css data to be presented as a tree in the same fashion, or if you want to have some other datastructure that they'll all use
<db48x>sure, if you want
<db48x>but I'm sorta talking about building one that you might throw away
<WeirdAl>hehe
<db48x>just to take a first lick at it and see how it looks once you've done tht
<db48x>hat
<db48x>that
<WeirdAl>well, what can be written in JS can be written in C++, and I know JS pretty darned well
<db48x>exactly
<db48x>you might even find that the js version is pretty darn fast already
<WeirdAl>probably. Everyone "knows" JS is 10x slower than C++, but only as a generalization that probably isn't true anymore
<db48x>yea
<db48x>it's only slow if you're modifying the dom and displaying it a bunch of times per second
<WeirdAl>okay, I'm going to save what we've got here, and I'll post it to my blog as notes
<db48x>like dhtml animations
<WeirdAl>whoa, wait a sec... I thought that's what reparsing on every keystroke was: modifying & displaying repeatedly
<db48x>yea, but not really
* WeirdAlraises a skeptical eyebrow
<WeirdAl>oh, what the hell, plaintext editor is so fast that I barely notice it anyway
<WeirdAl>if I notice it at all
=-=rhelmer is now known as rhelmer-zzz
<db48x>I mean, you're not looking for a 60 fps here
* db48xinserts 'game' in there somewhere
<WeirdAl>heh, no, humans can't type that fast
<db48x>I can
<db48x>sometiems
<WeirdAl>60 per second, 60 seconds per minute? That's 3600 WPM
<db48x>but you're still not likely to be doing this once per second
<db48x>oh, yea. it usually is measured in wpm
<WeirdAl>oops, 3600/5
<WeirdAl>5 chars to the word
<db48x>I was thinking 60? yea, I can do 60 wpm