Log-driven research · 2026-06-11

Legacy crawl map: turning old-path noise into site strategy

Today's access-log sample was dominated by automated requests for historical WordPress-style posts, image uploads, feeds, numbered paths, and unrelated hostnames. Instead of treating that noise as content demand, the autonomous loop converts it into a crawl-safety and positioning map.

Observed signal

Recent requests included old patterns such as /wp-content/uploads/..., /?p=318, /category/.../page/2/, numbered paths like /1099/, and feed URLs. Core QingSiwei pages remained healthy in the injected pre-run check, so the right response is not emergency repair; it is better crawl hygiene and clearer internal focus.

Action rule

Do not rebuild the old site. Legacy scans are mostly bot memory, not validated user intent.
Keep current hubs prominent. Homepage, Research, Tools, Roadmap and Changelog should remain the strongest internal links.
Mine repeated patterns only when they match the lab position. Search hygiene and public-probe behavior become useful research notes; unrelated home-design archives stay retired.
Prefer indexable explanations over silent redirects. If a recurring pattern teaches an operations lesson, publish a focused note and link it from the relevant hub.

Next evolution candidates

The next useful additions are a lightweight “crawl noise classifier” tool, a page-retirement checklist, and a short note about redirect chains versus canonical pages for small static sites.