Drift Detection
Drift detection is a feature that periodically executes a Terraform plan on a Configuration to ensure that the expected state, as defined by the Terraform state, aligns with the actual cloud resources. To utilize drift detection, Configurations must explicitly opt-in through their specification.
apiVersion: terraform.appvia.io/v1alpha1
kind: Configuration
metadata:
name: bucket
spec:
module: https://github.com/terraform-aws-modules/terraform-aws-s3-bucket.git?ref=v3.1.0
providerRef:
namespace: terraform-system
name: aws
# Enable drift detection
enableDriftDetection: true
Tuning Drift Detection
Administrators have the ability to fine-tune drift detection through the controller, which offers two key configuration options:
Drift Intervals
The driftInterval
parameter specifies the duration that must elapse following the last Terraform plan execution (as recorded in the [Configuration](docs/terranetes-controller/reference/configurations.terraform.
appvia.io.md) object's status) before a new drift detection check is initiated. The default value for this interval is 3h
, indicating that a drift check will be performed every three hours from the last transition time for a given Configuration
object, provided it falls within the defined driftThreshold
.
It is essential to note that the drift check is always measured from the last Terraform plan execution. If the Configuration
is modified within the specified interval, the timer resets, and the next drift check will occur three hours from the time of modification.
The driftInterval
can be customized through the Helm value controller.driftInterval
, with the format specified in minutes or hours, such as 10m
or 10h
.
Drift Threshold
The driftThreshold
parameter is a configurable value that serves as a safeguard against overwhelming the cloud provider's API with drift detection checks. Since these checks involve executing a terraform plan
, they generate API requests to the cloud provider. Consequently, a large number of Configurations initiating drift checks simultaneously could lead to API timeouts and retries due to rate limiting.
The driftThreshold
is expressed as a percentage, represented by a float value between 0 and 1. This percentage determines the maximum number of Configuration resources that can concurrently execute a drift check.
Notably, this threshold considers all Configuration
resources, including those without enableDriftDetection
, to ensure protection against Cloud API limits.
Scenario 1:
- Total
Configuration
resources: 10 - Resources currently undergoing Terraform operations (plan or apply): 1
driftThreshold: 0.2
(equivalent to 20% of total resources, allowing a maximum of 2 resources)- Outcome: A
Configuration
withenableDriftCheck
set to true will initiate a drift detection check, as the current number of resources in progress does not exceed the defined threshold.
Scenario 2:
- Total
Configuration
resources: 10 - Resources currently undergoing Terraform operations (plan or apply): 2
driftThreshold: 0.2
(equivalent to 20% of total resources, allowing a maximum of 2 resources)- Outcome: A
Configuration
withenableDriftCheck
set to true will not initiate a drift detection check at this time, as the current number of resources in progress has reached the defined threshold. The check will be re-evaluated after a fixed interval of 5 minutes.
Scenario 3:
- Total
Configuration
resources: 10 - Resources currently undergoing Terraform operations (plan or apply): 0
driftThreshold: 0.1
(equivalent to 1% of total resources, allowing a maximum of 1 resource)- Outcome: A
Configuration
withenableDriftCheck
set to true will initiate a drift detection check, as no resources are currently in progress and the maximum number of resources that can be run simultaneously is rounded up to 1.
Selection Process
The controller selects a Configuration for drift detection based on the following criteria:
- Drift detection is explicitly enabled within the configuration's specification, denoted by
spec.enableDriftDetection: true
. - The configuration has successfully completed a Terraform lifecycle, encompassing plan, approval, and apply phases.
- The elapsed time since the last successful Terraform plan execution exceeds the defined drift interval.
- The current number of concurrent Terraform plan or apply operations does not exceed the drift threshold, ensuring that the cloud provider's API rate limits are not breached.
The controller's selection process operates on a best-effort basis, without a predefined order. This approach ensures that all eligible Configuration resources will be evaluated for drift detection over time.