System Alerts
System Alerts monitor the health of WISdom's data collection infrastructure. Unlike regular monitoring alerts that track SQL Server performance and availability, System Alerts detect issues with the data collection process itself.
System Alert Types
No Recent Collections
Monitors whether data collectors are successfully collecting data from monitored instances.
Default Threshold: 5 minutes without data collection
Scope: Per data collector
Severity: Critical
Trigger Condition: When a data collector fails to complete any data collection for the specified duration
Common Causes:
- Data collector service stopped or crashed
- Network connectivity issues are preventing access to the Azure API
- Data collector VM shutdown or resource constraints
Recommendations:
- Verify the data collector service is running
- Check network connectivity from the data collector to the Azure API
- Review data collector logs for error messages
- Confirm sufficient resources (CPU, memory, disk) on data collector VM
Collection Errors
Monitors connection and collection failures for individual target instances.
Default Threshold: 5 minutes of continuous collection failures
Scope: Per target instance
Severity: Critical
Trigger Condition: When collection attempts to a specific target fail continuously for the specified duration
Common Causes:
- Target instance unavailable or offline
- Authentication failures for a specific target
- Firewall blocking connections to the target
- Network latency is causing timeout failures
- Target instance is under heavy load, preventing connections
Recommendations:
- Verify the target instance is online and accessible
- Test network connectivity from the data collector to the target
- Confirm collection account permissions on the target
- Review firewall rules between the data collector and target
- Check target instance performance and resource availability
Low Space on Data Collector Management Drive
Monitors available disk space on data collector VMs to prevent collection failures due to insufficient storage.
Default Threshold: 5GB remaining disk space
Scope: Per data collector
Severity: Critical
Trigger Condition: When available disk space on the data collector management drive falls to or below the specified threshold
Common Causes:
- Collection Files are not being successfully uploaded to the Azure API
- The Collection Service account does not have permissions to remove the files
- Temporary collection data is not being cleaned up
- Insufficient initial disk allocation
Recommendations:
- Free disk space by removing old logs or temporary files
- Increase disk size allocated to data collector VM
- Review service account permissions
- Check for failed uploads or stuck collection processes