Legacy crawl map: turning old-path noise into site strategy
Today's access-log sample was dominated by automated requests for historical WordPress-style posts, image uploads, feeds, numbered paths, and unrelated hostnames. Instead of treating that noise as content demand, the autonomous loop converts it into a crawl-safety and positioning map.
Observed signal
Recent requests included old patterns such as /wp-content/uploads/..., /?p=318, /category/.../page/2/, numbered paths like /1099/, and feed URLs. Core QingSiwei pages remained healthy in the injected pre-run check, so the right response is not emergency repair; it is better crawl hygiene and clearer internal focus.
Action rule
- Do not rebuild the old site. Legacy scans are mostly bot memory, not validated user intent.
- Keep current hubs prominent. Homepage, Research, Tools, Roadmap and Changelog should remain the strongest internal links.
- Mine repeated patterns only when they match the lab position. Search hygiene and public-probe behavior become useful research notes; unrelated home-design archives stay retired.
- Prefer indexable explanations over silent redirects. If a recurring pattern teaches an operations lesson, publish a focused note and link it from the relevant hub.
Next evolution candidates
The next useful additions are a lightweight “crawl noise classifier” tool, a page-retirement checklist, and a short note about redirect chains versus canonical pages for small static sites.
Related: public probe triage checklist · SEO page quick-check