[Catalyst] Alien::Dojo uses regexes to parse HTML, so what?
A. Pagaltzis
pagaltzis at gmx.de
Mon May 29 21:28:59 CEST 2006
* Dominique Quatravaux <dom at idealx.com> [2006-05-29 19:20]:
> or even
>
> my ($url) = qr{href="http://download.dojotoolkit.org/release[^"]+)"}sx
You’re getting closer; that has fewer failure modes than trying
to parse the whole anchor tag. Off the top of my head:
my ($url) = qr{href\s*=\s*(["'])?(http://download.dojotoolkit.org/release(?:.(?!(?(1)\1|\s)))+)}si;
I think that would be enough to catch all possible variations. Untested.
But:
> and pray tell me what's wrong with those. HTML is a *text*
> language, for chrissake, it was designed *purposefully* so that
> I am able to do that sort of thing.
You are having an XY problem (where X is “parse the page” and Y
is “pattern”). Matt is right: the correct answer is not to parse
at all.
Regards,
--
#Aristotle
*AUTOLOAD=*_;sub _{s/(.*)::(.*)/print$2,(",$\/"," ")[defined wantarray]/e;$1};
&Just->another->Perl->hacker;
More information about the Catalyst
mailing list