[Catalyst] Alien::Dojo uses regexes to parse HTML, so what?
phaylon
phaylon at dunkelheit.at
Tue May 30 10:52:25 CEST 2006
Dominique Quatravaux said:
> No it's not. We are not trying to address the problem of parsing HTML
> in general, we are trying to address the problem of parsing *one
> single page*.
*One single page of HTML.* HTML is not a data structure, it's a (partly)
fuzzy larkup language.
> Since I apparently have to be that explicit to make my point, consider
Well, seeing how you don't seem to *want* to argue about it, but rather
just prove your point, I think it might better we end this discussion?
> my ($url) = qr{<a ^>+
> href="(http://download.dojotoolkit.org/release[^"]+)"}sx
<a href='http://download.dojotoolkit.org/release/foo'>
<a href="http://www.dojotoolkit.org/download/release/foo">
<a href="ftp://ftp.dojotoolkit.org/realease/foo">
<A HREF="http://download.DojoToolkit.org/release/foo">
> or even
>
> my ($url) = qr{href="http://download.dojotoolkit.org/release[^"]+)"}sx
Same as above.
> and pray tell me what's wrong with those. HTML is a *text* language,
> for chrissake, it was designed *purposefully* so that I am able to do
> that sort of thing.
Perl is also just a "*text* language," please show me the Regex to parse
it. Just accept it, regular expressions were *not* made to parse HTML.
They might be built to be utilized by a *HTML Parser* to work with the
HTML, but they don't really parse it themselves.
I hope *I* have been explicit enough this time.
p
More information about the Catalyst
mailing list