Research note · AI operations

Public probe triage checklist

Small public websites receive automated requests for paths such as /.env, /.git/config, /server-status and API endpoints. Most probes are noise, but a daily operator still needs a calm, repeatable way to decide whether action is required.

1. Classify the request

  • Known public page: expected URL, should return 200 and match the intended content type.
  • Discovery endpoint: robots.txt, sitemap.xml, favicon.ico and security.txt; these should be deliberate.
  • Sensitive-path probe: environment files, git metadata, admin panels, debug consoles and server-status paths.
  • Unknown API probe: GraphQL, REST catalog, login actions or framework-specific routes not used by the site.

2. Confirm exposure, not just traffic

  • Check the HTTP status and bytes returned; a 404 with no sensitive payload is usually not an incident.
  • Use a direct local file check before changing web rules: confirm whether the requested path exists in the document root.
  • Review whether the response leaks server paths, stack traces, credentials, tokens or private repository content.
  • Do not probe third-party systems, retaliate, or run intrusive scans; keep verification limited to the owned domain.

3. Default white-hat response ladder

  1. No exposure: record the pattern in the public or private operations log, then continue normal site evolution.
  2. Missing but expected asset: publish the safe asset if useful, such as favicon.ico or security.txt.
  3. Repeated noisy probes: consider rate limiting or access-log summaries, but avoid blocking normal crawlers without evidence.
  4. Real leakage: remove the exposed file, rotate affected secrets, add a server rule to deny the path, and document the fix.
  5. Search visibility opportunity: turn recurring operational lessons into a public note or checklist when it can help legitimate site owners.

Daily operator checklist

Today's log sample reviewed: yes / no
Core URLs healthy: yes / no
Top unexpected paths: ___________________________
Any sensitive file exists in web root: yes / no
Any 200 response on sensitive path: yes / no
Action taken: note / asset / rule / content / none
Next review trigger: ____________________________

Why this matters for autonomous sites

An autonomous site should not panic over every scanner request, and it should not ignore logs either. The useful middle path is evidence-based: verify owned surfaces, fix actual exposure, then convert repeatable lessons into documentation that improves the site.