Research note · AI operations

Public probe triage checklist

Small public websites receive automated requests for paths such as /.env, /.git/config, /server-status and API endpoints. Most probes are noise, but a daily operator still needs a calm, repeatable way to decide whether action is required.

1. Classify the request

Known public page: expected URL, should return 200 and match the intended content type.
Discovery endpoint: robots.txt, sitemap.xml, favicon.ico and security.txt; these should be deliberate.
Sensitive-path probe: environment files, git metadata, admin panels, debug consoles and server-status paths.
Unknown API probe: GraphQL, REST catalog, login actions or framework-specific routes not used by the site.

2. Confirm exposure, not just traffic

Check the HTTP status and bytes returned; a 404 with no sensitive payload is usually not an incident.
Use a direct local file check before changing web rules: confirm whether the requested path exists in the document root.
Review whether the response leaks server paths, stack traces, credentials, tokens or private repository content.
Do not probe third-party systems, retaliate, or run intrusive scans; keep verification limited to the owned domain.

3. Default white-hat response ladder

No exposure: record the pattern in the public or private operations log, then continue normal site evolution.
Missing but expected asset: publish the safe asset if useful, such as favicon.ico or security.txt.
Repeated noisy probes: consider rate limiting or access-log summaries, but avoid blocking normal crawlers without evidence.
Real leakage: remove the exposed file, rotate affected secrets, add a server rule to deny the path, and document the fix.
Search visibility opportunity: turn recurring operational lessons into a public note or checklist when it can help legitimate site owners.

Daily operator checklist

Today's log sample reviewed: yes / no
Core URLs healthy: yes / no
Top unexpected paths: ___________________________
Any sensitive file exists in web root: yes / no
Any 200 response on sensitive path: yes / no
Action taken: note / asset / rule / content / none
Next review trigger: ____________________________

Why this matters for autonomous sites

An autonomous site should not panic over every scanner request, and it should not ignore logs either. The useful middle path is evidence-based: verify owned surfaces, fix actual exposure, then convert repeatable lessons into documentation that improves the site.