Fixing a Static Archive of a Drupal Site to Redirect Old URLs with Mod Rewrite

Februrary 18, 2018, Looking at Numbers

On the 10th, the numbers changed dramatically, probably due to using the “page of doom”. It pushed pre-existing trends and pushed them over the cliff (or shot them to the moon). Look at these numbers:

The total number of indexed pages increased by 10%. The composition of excluded pages changed dramatically.

The biggest change was pages excluded because there was an alternate page with a canonical tag: from 4 to 237. That was due to having URLs with parameters /?page=9, /page/10/?page=4, etc. all having a canonical tag that points to the home page. I wonder if this is a bug. It seems like one. They should point to the parameter-less page, not the home page. I wonder if a rewrite rule caused this.

Pages with redirects grew due to three things: pages missing .html being redirected to the page with .html (or a new wordpress page); pages being deleted, and redirecting to the home page; page with ‘reply’ (or some other old Drupal action) being redirected to the home page.

The noindex pages grew, but the reason is probably a bug in my rewrites. In the /news/ folder, I used to run an RSS aggregator. I rewrote the URLs to point to the home page. When Apache did the rewrite, it preserved the parameters. Here’s the bad code:

[code]
RewriteRule ^$ http://riceball.com/ [L,R=301]
RewriteRule ^(.+)$ http://riceball.com/ [L,R=301]
[/code]

The fix is to append a ? to the URL, so Apache deletes the parameters. Here’s the fixed code.

[code]
RewriteRule ^$ http://riceball.com/? [L,R=301]
RewriteRule ^(.+)$ http://riceball.com/? [L,R=301]
[/code]

I think this also happened to the pages included in the alternate page with a canonical tag. I recall making the same fix, but can’t recall when.

The other huge change was the decline in crawl anomalies. That’s due to all the redirects that fixed the dead URLs.

The number of unindexed pages also fell. I’m not sure why that was.

The number of 404 pages increased slightly. The crawler visited around 50 old URLs that I deleted, and so these are legitimate 404s that I’ll need to fix manually.

The other 404 errors are old pages that are now redirecting to good pages. I’ll do the “page of doom” trick again, and make a page full of links, and ask Google to index that page, and all linked pages.

There were some soft 404 pages with the /news/ URL, so I requested that one be indexed. I hope Google tests those old URLs.

Time to Wait for Changes

Now, I need to wait to see some results. The old bad URLs were mostly fixed, but I created some new bad URLs. These have been fixed.

PageRank?

I have no idea if the prchecker.info site is any good, but it says I have a rank of 3. I wish I’d written down the page, before. I think it used to be 1, long ago.

An Inventory of Bad URLs and Obsolete Articles

If there’s anything I’ve learned from fixing up my old sites, it’s the value of URLs. Previously, I thought of websites as pages; now I think of the URLs. Previously, I thought the value came from writing the information — and it does — but ignored the value of URLs.

Those old 404 URLs have some value. They carry a little page juice, and it can be sent to another page, via a 301 redirect. If they have inbound links, then, they may pass on the page juice.

Outdated articles can be removed, and redirected to a more relevant page on the same site, or they can be edited to link to the better page on the same site. Scattered articles can be rewritten into a single, coherent article, and the old pages can link to it.

Pages: 1 2 3 4 5 6

Leave a Reply