Hello Gaven,
The problem you're experiencing with SNMP (Simple Network Management Protocol) failing to accept inbound connections after DNS issues is likely due to how the SNMP service is handling
Possible Causes:
The SNMP service does not have a built-in mechanism to handle DNS failures dynamically when using FQDNs in security settings.
A bug or limitation in certain versions of Windows Server that does not automatically recover from DNS issues after they are resolved.
Solutions:
1. Use IP Address for SNMP Security:
Change SNMP Security Settings: Instead of using FQDNs for SNMP access control, consider using IP addresses in the SNMP security configuration. This will remove the dependency on DNS resolution and prevent the issue from occurring if DNS is unavailable.
Pro: This eliminates the DNS-related issue entirely.
Con: You lose the flexibility of using FQDNs, especially in dynamic or changing network environments.
2. DNS Caching and TTL:
Increase the DNS Cache TTL (Time-to-Live): If you must use FQDNs, check if your DNS server is caching records for a short period. You could increase the TTL in the DNS records so that the SNMP service doesn't need to rely on real-time DNS lookups and can work with cached records.
Pro: This would reduce the impact of DNS failures and reduce the frequency of FQDN lookups.
Con: Not a complete fix, but it could mitigate the issue during short DNS outages.
3. Update Windows and SNMP Service:
Check for Updates: Make sure that you are running the latest patches and updates for the version of Windows Server you're using. Microsoft may have released fixes for issues related to SNMP and DNS interactions in later updates.
Pro: A patch or update could address the issue if it's a known bug.
Con: It may not resolve the issue if it's an inherent limitation of the SNMP service.
5. Use Event Log Monitoring:
Monitor for DNS Failures: Even though you mentioned there's nothing in the Event Log, you could create custom monitoring for DNS failures and SNMP service states. Use tools like System Center Operations Manager (SCOM) or third-party solutions to get more granular visibility into service health, which could help in identifying whether there are underlying issues not logged by SNMP itself.
Conclusion:
The most reliable long-term solution is to switch from FQDN-based SNMP access control to using IP addresses. However, if you must continue using FQDNs, you can explore automating SNMP service restarts or increasing DNS cache TTL. Additionally, ensure that your systems are fully updated to prevent any known bugs from affecting SNMP functionality.