[00:00] mikeal: https://bugzilla.mozilla.org/show_bug.cgi?id=524436 was specific to this page and marked a dupe of that bug
[00:02] isaacs: so, i'm gonna make the sax parser do XML, and do it right, with loose and strict modes. Then the HTML parser can be a layer on top of that which watches the stream of tags and makes all kinds of psychotic changes to it.
[00:02] jackyyll: tlrobinson: it lags in webkit too :P
[00:03] ryah: isaacs: i think html is more or less unbalanced xml
[00:04] isaacs: ryah: "less" ++
[00:04] isaacs:
hello
paragraphbold hello paragraphbold
[00:05] jackyyll: how come node.js doesn't have a wikipedia page?
[00:05] isaacs: that's: heading
[00:05] isaacs: good luck
[00:08] ryah: isaacs: unbalanced xml with some header sent: "2 0 1 0 0 0 3a 0 3c 0 0 0 0 0 0 0 0 0 b 0 0 0 2a c 43 6f 6e 74 65 6e 74 2d 54 79 70 65 53 0 0 0 18 61 70 70 6c 69 63 61 74 69 6f 6e 2f 6f 63 74 65 74 2d 73 74 72 65 61 6d ce "
[00:08] ryah: oops
[00:08] ryah: eccentricities
[00:08] isaacs: that was certainly eccentric!
[00:14] paul_ has joined the channel
[00:16] bentomas: A quick question about node request objects, if I pause a request and never resume it will I suffer any performance losses? Like will node be buffering the messages or something?
[00:17] ryah: bentomas: node wont but the kernel will
[00:17] bentomas: gotcha, won't do that then!
[00:17] ryah: as much as it can of course
[00:18] mikeal has joined the channel
[00:18] ryah: but then it does this: http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Flow_control
[00:18] ryah: that is, it starts telling the sender that it can't handle more data
[00:18] ryah: to back off on the speed
[00:18] ryah: so - do call it!
[00:19] bentomas: gotcha!
[00:22] konobi: ryah: http://github.com/orlandov/node-riak
[00:23] bryanl has joined the channel
[00:23] ryah: wow that was fast
[00:23] konobi: orlandov++
[00:24] orlandov: ryah: it's not very good yet :)
[00:24] bentomas has left the channel
[00:24] mikeal: NoSQL FTW!
[00:29] ryah: beginning of node: http://four.livejournal.com/920900.html
[00:31] ryah: also http://four.livejournal.com/923835.html
[00:31] stephenlb has joined the channel
[00:35] jed_ has joined the channel
[00:36] inimino: ryah: nice
[00:38] brapse has joined the channel
[00:40] tmpvar has joined the channel
[00:50] joshbuddy has joined the channel
[00:50] joshbuddy has joined the channel
[00:55] pjb3: I was just looking at the list of node users: http://wiki.github.com/ry/node/node-users
[00:55] pjb3: And maybe I'm dumb
[00:55] pjb3: but isn't US Eastern UTC-5 (-4) ?
[00:55] isaacs: pjb3: yeah, huh.
[00:56] mattly: are there any interfaces to image processing libraries for node yet?
[00:56] deanlandolt: heh, that's an awful lot of folks in the midwest...seems a little odd :D
[00:56] isaacs: guess we got no central time zone users?
[00:56] pjb3: isaacs: ok, thanks, I thought I was crazy
[00:56] isaacs: pjb3: i don't know that you're not.
[00:56] isaacs: ;P
[00:57] ryah: mattly: probably best to do that out-of-process
[00:57] mattly: yeah
[00:57] pjb3: isaacs: well, not in this case, but overall, your right, good point ;)
[00:57] ryah: mattly: but, no
[00:57] deanlandolt: pjb3: yeah, kriskowal just did a narwhal page based on the node-users page...i had to do some head scratching, then added UTC-5
[00:57] mattly: i'm just thinking about how to handle it
[00:57] mattly: ...thinking about porting a fledling ruby project over
[00:57] mattly: *fledgling
[00:57] ryah: wrapping the imagemagik shell programs
[00:57] isaacs: mattly: you could expose GDImage or imagemagik
[00:58] isaacs: they're avaialble as C libs, at least gdimage is, i know
[00:58] ryah: but image processing is pretty heavy. i don't think it's a terrible idea to start a different process to do it
[00:59] konobi: Imager.pm is pretty awesome, but not sure if it's a seperate library
[00:59] ryah: at least for doing resizes and stuff
[00:59] mattly: hm
[00:59] okito has joined the channel
[01:00] mattly: resizing is basically 95% of what i need
[01:01] konobi: mattly: shell out to convert
[01:01] ryah: out of process = safe (node wont crash due to a bad binding), easy to build (just js), easier to write
[01:01] ryah: at the cost of maybe 1ms
[01:02] ryah: per resize
[01:02] mattly: yeah
[01:02] mattly: i can deal with 1ms
[01:02] mattly: thanks
[01:02] brapse has joined the channel
[01:03] ryah: http://bulk.fefe.de/scalable-networking.pdf
[01:04] ryah: okay maybe 4 ms
[01:05] ryah: between 2 and 3 it seems
[01:05] ryah: (page 13)
[01:14] kriszyp_ has joined the channel
[01:32] pjb3: What is the top-level unescapse function?
[01:32] pjb3: unescape
[01:33] pjb3: Is that a JS thing? V8? Node?
[01:33] deanlandolt: pjb3: js thing...
[01:34] deanlandolt: encodeURIComponent
[01:34] ryah: pjb3: v8
[01:34] deanlandolt: err...unescape being decodeURIComponent
[01:34] ryah: pjb3: well js thing, v8 implements it
[01:35] eikke has joined the channel
[01:36] pjb3: thanks, didn't realize there was a top-level function for that
[01:37] deanlandolt: yeah, kinda goofy :-/
[01:38] pjb3: I was just reading through the source code for node-router and say match.map(unescape)
[01:38] pjb3: saw
[01:38] pjb3: and I was trying to figure out where that came from
[01:38] pjb3: unescape, that is
[01:39] deanlandolt: pjb3: oh yeah, also in js
[01:40] creationix has joined the channel
[01:40] midnware2 has joined the channel
[01:45] creationix: ryah: what kind of info do you want on your streaming wrapper to tcpdump?
[01:46] creationix: ngrep seems easier than tcpdump
[01:48] mattly has joined the channel
[01:48] ryah: requested url
[01:49] ryah: to start with :)
[01:49] ryah: would be fun for ethernet lans
[01:52] creationix: would you want all the chunks abstracted away or see each piece of each stream
[01:54] ryah: i just want to sniff my neighbor's traffic
[01:54] ryah: in a real time way
[01:58] creationix: sudo ngrep port 80 -W byline is giving me great output, I'll see about making a quick node wrapper for it
[01:59] pavelz: can't seem to get v8 compiled in ubuntu karmic koala
[02:01] ryah: pavelz: output?
[02:01] ryah: i've compiled it
[02:01] ryah: in karmic
[02:02] creationix: pavelz: make sure you install "build-essential" first, other than that it should work out of the box.
[02:03] creationix: and
[02:03] Yuffster has joined the channel
[02:03] creationix: "libgnutls-dev" if tou want tls support
[02:07] RayMorgan_ has joined the channel
[02:09] isaacs: ryah: fyi: an valid xml parser that doesn't use ANY regexps is almost impossible.
[02:10] ryah: isaacs: how do you quantify degrees of impossibility?
[02:10] inimino: isaacs: why?
[02:10] inimino: heh
[02:11] isaacs: because of this: Digit ::= [#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9] |
[02:11] isaacs: [#x0966-#x096F] | [#x09E6-#x09EF] | [#x0A66-#x0A6F] | [#x0AE6-#x0AEF] |
[02:11] isaacs: [#x0B66-#x0B6F] | [#x0BE7-#x0BEF] | [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] |
[02:11] isaacs: [#x0D66-#x0D6F] | [#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29]
[02:11] isaacs: so my cute little string of allowed characters would have to include most of the unicode set.
[02:11] ryah: isaacs: don't follow the html5 spec too closely
[02:11] inimino: isaacs: oh, well... that would be unpleasant to write by hand, yes
[02:11] ryah: just do it by your head :)
[02:11] isaacs: i'm ok being a little bit chauvanistic.
[02:11] isaacs: node speaks english. deal with it.
[02:12] isaacs: jan____ and felixge seem to get by alright with it. the other internationals can fall in line.
[02:12] inimino: where does XML use that Digit production?
[02:12] ryah: yeah
[02:12] ryah: #x0660-#x0669 ?
[02:12] ryah: i wonder what those are
[02:13] inimino: arabic-indic digits
[02:13] ryah: Arabic-Indic digits
[02:13] ryah: crazy
[02:13] ryah: yeah. i think \u0030 - \u0039 is okay :)
[02:13] inimino: isaacs?
[02:13] isaacs: inimino: in http://www.w3.org/TR/REC-xml/
[02:13] isaacs: search for "[88] Digit"
[02:14] inimino: I meant where in the grammar, but OK, I'll look
[02:14] ryah: we'll get arabic-indic digits in the next pass
[02:14] isaacs: "Letter" is even bigger.
[02:14] jed: oh my. that's insane.
[02:14] inimino: isaacs: I have code that generates JavaScript expressions to do those tests
[02:14] inimino: isaacs: and to look up the Unicode categories
[02:15] ryah: fucking xml man
[02:15] isaacs: inimino: great. i'll make it anglocentric, and you can internationalize it.
[02:15] isaacs: ryah: srsly.
[02:15] jed: does digit include things like 一 and 壱 and ①?
[02:15] isaacs: html doesn't actually use those anyhow, and this xml thing is really just a stepping stone to a good html parser.
[02:16] inimino: isaacs: I'm studiously avoiding writing XML or HTML parsers :-)
[02:17] ryah: inimino: you should get into writing parsers by hand
[02:17] ryah: it's very satisfyin
[02:17] ryah: well. when it works
[02:17] ryah: ACTION eyes amqp in the corner
[02:17] isaacs: it is
[02:18] ryah: i used to do everything with ragel
[02:19] inimino: yeah, I've done it, it's satisfying sometimes
[02:19] inimino: but for HTML?
[02:19] inimino: no.
[02:20] ryah: ACTION thinks so
[02:20] inimino: I've looked at production-worthy HTML parsers and they are big, ugly beasts
[02:20] ryah: (of course i'm not doing it)
[02:20] inimino: they do need to be written by hand, generally, but I'm glad there are other people doing it :)
[02:20] isaacs: you guys are jerks. ;P
[02:21] inimino: for reasonable languages, parser generators are awesome :)
[02:21] ryah: i mean, i think http is much harder than html
[02:21] inimino: heh-heh
[02:22] ryah: http://github.com/ry/http-parser/blob/master/http_parser.c <-- not so bad
[02:22] inimino: I think HTTP is an order of magnitude easier than HTML
[02:23] isaacs: ryah: nah, http's WAY easier than html
[02:23] isaacs: course, you did it in C, which makes everything much harder.
[02:23] ryah: that's just cause you guys don't know http :)
[02:23] isaacs: hey, i know http!
[02:23] inimino: it's cause you don't know HTML :-)
[02:23] ryah: maybe :)
[02:23] isaacs: multipart is at least as hard as http, since it allows almost every construct http does.
[02:23] isaacs: but nested.
[02:24] konobi: HTTP is pretty vicious about "do it my way or the highway", html parsers need to be far more forgiving
[02:24] ryah: http is really hard to get right
[02:24] isaacs: It's hard to forgive.
[02:24] ryah: a 90% http parser is pretty simple
[02:24] isaacs: ryah: that's true. i've been pretty happy with node's.
[02:25] isaacs: and the nice thing is that when it's broken, i can bug you about it, and you fix it.
[02:25] ryah: i've been writing it for 2 years - and zed's been writing it for 2 years before that :)
[02:25] konobi: other than the multi-line header values!
[02:25] konobi: =0P
[02:25] ryah: yeah, got to add multi-line headers
[02:25] inimino: HTML parsing is easy if you only care about the 80% cases
[02:25] isaacs: ryah: given any thought to making it not an HTTP parser so much as a generic internet message parser?
[02:25] isaacs: or exposing it as such, at least?
[02:25] isaacs: it'd simplify the multipart parser a bunch.
[02:26] ryah: no
[02:26] ZhouYu has joined the channel
[02:26] inimino: browsers tend to worry about the last 1%
[02:26] inimino: isaacs: I'm not sure there's enough that could be shared
[02:26] pavelz: creationix: http://pastie.org/817558 seems to be a g++ issue :-S
[02:26] isaacs: inimino: basically, just parsing headers+body type things.
[02:27] creationix: pavelz: strange, I just compiled in on my Karmic box with no problem
[02:27] ryah: here is what someone wrote me re the parser:
[02:27] ryah: In case you were curious about exactly what I am doing with http-parser, I have a tcpdump trace of 40,000 flows to port 80 that started in SYN and ended in FIN packets both ways over the course of 1 day. (so this excludes connections that were reset or did not both begin and end during my trace)
[02:27] ryah: I have written an element inside of click that instantiates two http-parser objects per TCP flow, and it also has logic to deal with TCP sequence numbers are reordering. This element can now parse both the incoming and outgoing tcp streams for all except 169 of them. I believe that most of the 169 failures are of the form above, and a handful of them are because my trace file doesn't have 100% of the packets. I have also seen at least one from Microsoft IIS 6 that
[02:29] inimino: "IIS 6 that" …
[02:29] ryah: i.e. 99.5%
[02:29] mindwar_ has joined the channel
[02:29] inimino: that's good
[02:30] inimino: ryah: I thought you wrote http-parser in a week :-)
[02:30] ryah: well...
[02:30] ryah: before that i worked on the ragel parser for a long time
[02:30] isaacs: the time it takes to write something includes all the time spent writing things that weren't that.
[02:30] ryah: and i knew how it worked very well
[02:30] isaacs: it's taken my whole life to write every line of code i produce.
[02:30] ryah: i also had a large test collect
[02:30] ryah: ion
[02:31] inimino: isaacs: I wonder if I can work that into my hourly billing
[02:31] isaacs: inimino: you probably already do! it's why you're charging more than you would have right out of college or whatever.
[02:31] inimino: hehe, true
[02:32] davidjrice: anyone using a good node templating lang?
[02:32] ryah: but! isaacs i was talking to a webkit developer recently
[02:32] davidjrice: thinking of using jan____'s mustache and posix.cat
[02:32] ryah: he assured me very strongly that writing an html parser by hand is very easy
[02:33] isaacs: yeah, i'm starting to see the light.
[02:33] ryah: maybe he didn't say very easy - but 'very doable' i think were the words
[02:33] isaacs: a lot of things are very doable.
[02:33] ryah: an afternoon project
[02:33] bronson has joined the channel
[02:33] ryah: which means a week project
[02:34] isaacs: i'm sure the webkit team is glad to have him, if that's an afternoon project ;)
[02:34] davidjrice: ACTION just found the node modules wiki page. don't mind me :)
[02:35] jan____: mustachemustachemustache
[02:35] davidjrice: jan____: Mu looks interesting!
[02:36] ryah: it's just unbalanced xml with some mild heuristics for which tags get closing priority
[02:36] jan____: mustache.js now does streaming
[02:37] inimino: ryah: it depends how much compatibility with the Web you want
[02:37] inimino: for browsers, even 99% isn't good enough
[02:37] inimino: http://mxr.mozilla.org/mozilla-central/source/parser/html/javasrc/Tokenizer.java
[02:37] inimino: that is hsivonen's implementation of the HTML5 parsing algorithm
[02:38] inimino: that's the tokenizer
[02:38] jan____: davidjrice: haven't looked at mu, but since mustache.js now does streaming I don't see the point
[02:38] inimino: this is the tree-builder: http://mxr.mozilla.org/mozilla-central/source/parser/html/javasrc/TreeBuilder.java
[02:38] davidjrice: jan____: yeah? you used it with node?
[02:38] inimino: ryah: anyway, for browser-level compatibility, it can't be any simpler than that
[02:38] ryah: inimino: if i can do what hpricot can do, i'd be happy
[02:38] ryah: http://github.com/whymirror/hpricot/blob/master/ext/hpricot_scan/hpricot_scan.rl
[02:39] jan____: davidjrice: not yet, but no reason why it shouldn't work.
[02:39] inimino: yes, that level is pretty easy
[02:40] inimino: definitely you can do that in an afternoon (or week)
[02:40] inimino: hey, it's the octo-cat
[02:41] pavelz: creationix: what is your g++ 4.4.1?
[02:41] creationix: pavels: gcc version 4.4.1 (Ubuntu 4.4.1-4ubuntu8)
[02:42] creationix: pavelz: and my uname is - Linux creationix 2.6.32-linode23 #1 SMP Sat Dec 5 16:04:55 UTC 2009 i686 GNU/Linux
[02:42] pavelz: creationix: yeah more or less the same...
[02:42] rtomayko has joined the channel
[02:43] creationix: pavelz: sorrry I'm not much more help, I'm not a c++ dev, just a linux user
[02:43] jan____: davidjrice: just hookup the new send() function callback into an event emitter and you should be good
[02:48] hassox has joined the channel
[03:16] nodejsbot has joined the channel
[03:19] technoweenie has joined the channel
[03:39] isaacs has joined the channel
[03:40] isaacs: Hah, felixge should really update the node logger thing to strip html, or at least s/</g
[03:40] isaacs: my comment at [00:04] made the whole log bold for today
[03:40] Tim_Smart: lol
[03:40] isaacs: maybe will unbold it
[03:41] isaacs: oh, and
[03:41] tmpvar: hah
[03:41] isaacs: and