Known Issues
Self Node Remediation has several known issues:
- Currently only one health detection system (e.g. NHC, MHC) is supported at the same time (i.e. you can’t use NHC and MHC at the same time)
- The timeout to assume the node has been rebooted is the same for all nodes. The safe timeout should be configured for the highest watchdog timeout
- Uninstalling the operator doesn’t remove the self node remediation daemonset. A user should delete the SelfNodeRemediationConfig CR to remove the daemonset
- Upon installing self node remediation operator, it might take up to 2 minutes before the daemonset is deployed
- IPv6 is not supported
- At least two workers are required to get node fencing
- At release 0.3 channel name was changed from “alpha” to “stable”. This means that when releases prior to 0.3 are upgraded, the channel name needs to be changed manually
- Prior to installing SNR on a Kubernetes 1.25+ cluster, a user must manually set a privileged PSA label (i.e. pod-security.kubernetes.io/enforce: privileged) on SNR’s namespace. It gives SNR’s agents permissions to reboot the node (in case it needs to be remediated).