[Catalyst] Handling arguments with Reges vs the meaning of paths
apv
apv at sedition.com
Fri Jan 6 20:51:15 CET 2006
On Friday, January 6, 2006, at 10:59 AM, Bill Moseley wrote:
>> Among other things, returning 200 for URLs with random
>> discardable bits means that bots such as such search engine
>> spiders may go on a wild goose chase, fetching different-
>> looking-but-not-actually-different URLs all day long without
>> knowing any better.
>
> Spider's don't make up URLs to follow, they follow links. So that's
> only going to happen if you are putting up invalid links.
Which would be completely safe if one never made mistakes
and no one else in the entire Internet was allowed to link to
your site. Once the bad URIs are in the indexes, they can
stay and even propagate depending the engine and the site
plan.
Bad spiders, hackers, and gateway spammers do make up
URLs for your site.
>
> If you don't want to allow extra segments after the action then it's
> easy to check @args and deal with it as you like.
The main point -- my imitation of broken record ends here :) --
is that best practices dictate you *always* check all input to an
application. Therefore, the default (or easily settable) should
be to check/limit them automatically.
>
>> If you want such URLs to produce a result rather than just 404,
>> then they should redirect to the canonical URL. I???ve written
>> about the considerations in
>>
>> Transparent opaque changeable permanent URLs
>> <http://plasmasturm.org/log/358/>
>
> Then what should a request for an invalid article do?
>
> $ HEAD http://plasmasturm.org/log/3583329393/
> 200 OK
It should probably 404. Pointing out that server doesn't currently
do so doesn't change the fitness of the ideas in the article.
-Ashley
More information about the Catalyst
mailing list