[Catalyst] Alien::Dojo uses regexes to parse HTML, so what?
Dominique Quatravaux
dom at idealx.com
Mon May 29 19:14:30 CEST 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
phaylon wrote:
> Dominique Quatravaux said:
>
>> I rest my case, unless someone can provide compelling reasons for
>> avoiding regexes *in general* for this task.
>
>
> mst gave only one to demonstrate the whole problem. It's like a
> big, lightsucking black hole.
No it's not. We are not trying to address the problem of parsing HTML
in general, we are trying to address the problem of parsing *one
single page*. Since I apparently have to be that explicit to make my
point, consider
my ($url) = qr{<a ^>+
href="(http://download.dojotoolkit.org/release[^"]+)"}sx
or even
my ($url) = qr{href="http://download.dojotoolkit.org/release[^"]+)"}sx
and pray tell me what's wrong with those. HTML is a *text* language,
for chrissake, it was designed *purposefully* so that I am able to do
that sort of thing.
- --
Dominique QUATRAVAUX Ingénieur senior
01 44 42 00 08 IDEALX
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFEeyv1MJAKAU3mjcsRAlHLAJ9z+4e+CqUeZDT8FMsIpai+O/boQwCgswRU
/iA8vhOertixG59MnvIn8/s=
=K1CT
-----END PGP SIGNATURE-----
More information about the Catalyst
mailing list