aws_rl_env / server /services /tasks /drift.yaml
Sizzing's picture
Upload folder using huggingface_hub
c745a99 verified
# Configuration Drift Detection Tasks (Expert Tier)
#
# Each task provisions correct infrastructure via setup_commands, then the
# DriftEngine randomly applies a subset of possible_drifts. The agent must
# audit the environment, discover which resources drifted, and fix only those.
- task_id: 24
description: >
The following infrastructure should exist: S3 bucket 'config-store' with
versioning enabled, a lifecycle rule named 'expire-old' that expires
non-current object versions after 90 days, and server-side encryption
using AES256. DynamoDB table 'sessions' with provisioned throughput of
100 RCU and 100 WCU. Some resources may have drifted from the desired
specification. Audit the current state and fix any configuration that
does not match.
desired_state_spec: >
S3 bucket 'config-store': versioning=Enabled, lifecycle rule 'expire-old'
expiring non-current versions after 90 days, SSE with AES256.
DynamoDB table 'sessions': 100 RCU, 100 WCU.
setup_commands:
- aws s3api create-bucket --bucket config-store
- >-
aws s3api put-bucket-versioning --bucket config-store
--versioning-configuration Status=Enabled
- >-
aws s3api put-bucket-lifecycle-configuration --bucket config-store
--lifecycle-configuration '{"Rules":[{"ID":"expire-old","Status":"Enabled","NoncurrentVersionExpiration":{"NoncurrentDays":90},"Filter":{"Prefix":""}}]}'
- >-
aws s3api put-bucket-encryption --bucket config-store
--server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'
- >-
aws dynamodb create-table --table-name sessions
--attribute-definitions AttributeName=id,AttributeType=S
--key-schema AttributeName=id,KeyType=HASH
--provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=100
possible_drifts:
- command: >-
aws s3api put-bucket-versioning --bucket config-store
--versioning-configuration Status=Suspended
description: Versioning disabled on 'config-store'
- command: >-
aws s3api delete-bucket-lifecycle --bucket config-store
description: Lifecycle rule removed from 'config-store'
- command: >-
aws s3api delete-bucket-encryption --bucket config-store
description: Encryption removed from 'config-store'
- command: >-
aws dynamodb update-table --table-name sessions
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=100
description: DynamoDB RCU reduced to 5
- command: >-
aws dynamodb update-table --table-name sessions
--provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=5
description: DynamoDB WCU reduced to 5
success_criteria:
services:
- s3
- dynamodb
state_checks:
- command: aws s3api get-bucket-versioning --bucket config-store
output_contains: "Enabled"
- command: aws s3api get-bucket-lifecycle-configuration --bucket config-store
output_contains: "expire-old"
- command: aws s3api get-bucket-encryption --bucket config-store
output_contains: "AES256"
- command: aws dynamodb describe-table --table-name sessions
json_path: "$.Table.ProvisionedThroughput.ReadCapacityUnits"
expected: 100
- command: aws dynamodb describe-table --table-name sessions
json_path: "$.Table.ProvisionedThroughput.WriteCapacityUnits"
expected: 100
- task_id: 25
description: >
The following infrastructure should exist: SNS topic 'ops-alerts' with
an SQS queue 'ops-inbox' subscribed to it. IAM role 'ops-automation'
with the AmazonSNSFullAccess and AmazonSQSFullAccess policies attached.
Lambda function 'alert-handler' using the 'ops-automation' role. Some
resources may have drifted. Audit and fix.
desired_state_spec: >
SNS topic 'ops-alerts' with SQS subscription 'ops-inbox'.
IAM role 'ops-automation' with AmazonSNSFullAccess and AmazonSQSFullAccess.
Lambda 'alert-handler' using role 'ops-automation'.
setup_commands:
- aws sns create-topic --name ops-alerts
- aws sqs create-queue --queue-name ops-inbox
- >-
aws sns subscribe --topic-arn arn:aws:sns:us-east-1:000000000000:ops-alerts
--protocol sqs
--notification-endpoint arn:aws:sqs:us-east-1:000000000000:ops-inbox
- >-
aws iam create-role --role-name ops-automation
--assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}'
- >-
aws iam attach-role-policy --role-name ops-automation
--policy-arn arn:aws:iam::aws:policy/AmazonSNSFullAccess
- >-
aws iam attach-role-policy --role-name ops-automation
--policy-arn arn:aws:iam::aws:policy/AmazonSQSFullAccess
- >-
aws lambda create-function --function-name alert-handler
--runtime python3.12 --handler index.handler
--role arn:aws:iam::000000000000:role/ops-automation
--code S3Bucket=dummy,S3Key=dummy.zip
possible_drifts:
- command: >-
aws iam detach-role-policy --role-name ops-automation
--policy-arn arn:aws:iam::aws:policy/AmazonSNSFullAccess
description: SNS policy detached from 'ops-automation'
- command: >-
aws iam detach-role-policy --role-name ops-automation
--policy-arn arn:aws:iam::aws:policy/AmazonSQSFullAccess
description: SQS policy detached from 'ops-automation'
- command: aws lambda delete-function --function-name alert-handler
description: Lambda 'alert-handler' deleted
success_criteria:
services:
- sns
- sqs
- iam
- lambda
state_checks:
- command: aws sns list-subscriptions-by-topic --topic-arn arn:aws:sns:us-east-1:000000000000:ops-alerts
output_contains: "ops-inbox"
- command: aws iam list-attached-role-policies --role-name ops-automation
output_contains: "SNSFullAccess"
- command: aws iam list-attached-role-policies --role-name ops-automation
output_contains: "SQSFullAccess"
- command: aws lambda get-function --function-name alert-handler
output_contains: "alert-handler"
- task_id: 128
description: >
The following infrastructure should exist: IAM role 'api-executor' with
AmazonDynamoDBFullAccess and AWSLambdaBasicExecutionRole policies attached.
Lambda function 'api-handler' with 256MB memory, 30s timeout, runtime
python3.12, and environment variable APP_ENV=production. Some resources
may have drifted. Audit the current state and fix any configuration that
does not match.
desired_state_spec: >
IAM role 'api-executor': AmazonDynamoDBFullAccess and AWSLambdaBasicExecutionRole attached.
Lambda 'api-handler': 256MB memory, 30s timeout, python3.12, env APP_ENV=production.
setup_commands:
- >-
aws iam create-role --role-name api-executor
--assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}'
- >-
aws iam attach-role-policy --role-name api-executor
--policy-arn arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess
- >-
aws iam attach-role-policy --role-name api-executor
--policy-arn arn:aws:iam::aws:policy/AWSLambdaBasicExecutionRole
- >-
aws lambda create-function --function-name api-handler
--runtime python3.12 --handler index.handler
--role arn:aws:iam::000000000000:role/api-executor
--code S3Bucket=dummy,S3Key=dummy.zip
--memory-size 256 --timeout 30
--environment '{"Variables":{"APP_ENV":"production"}}'
possible_drifts:
- command: >-
aws iam detach-role-policy --role-name api-executor
--policy-arn arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess
description: DynamoDB policy detached from 'api-executor'
- command: >-
aws lambda update-function-configuration --function-name api-handler
--memory-size 128
description: Lambda memory changed from 256MB to 128MB
- command: >-
aws lambda update-function-configuration --function-name api-handler
--timeout 3
description: Lambda timeout changed from 30s to 3s
- command: >-
aws lambda update-function-configuration --function-name api-handler
--environment '{"Variables":{}}'
description: Environment variables removed from 'api-handler'
- command: >-
aws lambda update-function-configuration --function-name api-handler
--runtime python3.9
description: Lambda runtime changed from python3.12 to python3.9
success_criteria:
services:
- iam
- lambda
state_checks:
- command: aws iam list-attached-role-policies --role-name api-executor
output_contains: "DynamoDBFullAccess"
- command: aws iam list-attached-role-policies --role-name api-executor
output_contains: "LambdaBasicExecutionRole"
- command: aws lambda get-function-configuration --function-name api-handler
json_path: "$.MemorySize"
expected: 256
- command: aws lambda get-function-configuration --function-name api-handler
json_path: "$.Timeout"
expected: 30
- command: aws lambda get-function-configuration --function-name api-handler
json_path: "$.Runtime"
expected: "python3.12"
- command: aws lambda get-function-configuration --function-name api-handler
output_contains: "APP_ENV"
- task_id: 129
description: >
The following infrastructure should exist: RDS instance 'app-db' with
instance class db.t3.micro, engine mysql, multi-AZ enabled, and 7-day
backup retention. Secrets Manager secret 'app-db/credentials' with
description 'Database credentials for app-db'. Some resources may have
drifted. Audit the current state and fix any configuration that does
not match.
desired_state_spec: >
RDS 'app-db': db.t3.micro, mysql, multi-AZ enabled, 7-day backup retention.
Secret 'app-db/credentials': description 'Database credentials for app-db'.
setup_commands:
- >-
aws rds create-db-instance --db-instance-identifier app-db
--db-instance-class db.t3.micro --engine mysql
--master-username admin --master-user-password SecurePass123
--multi-az --backup-retention-period 7
- >-
aws secretsmanager create-secret --name app-db/credentials
--description 'Database credentials for app-db'
--secret-string '{"username":"admin","password":"SecurePass123"}'
possible_drifts:
- command: >-
aws rds modify-db-instance --db-instance-identifier app-db
--no-multi-az --apply-immediately
description: Multi-AZ disabled on 'app-db'
- command: >-
aws rds modify-db-instance --db-instance-identifier app-db
--backup-retention-period 1 --apply-immediately
description: Backup retention changed from 7 days to 1 day
- command: >-
aws rds modify-db-instance --db-instance-identifier app-db
--db-instance-class db.t3.small --apply-immediately
description: Instance class changed from db.t3.micro to db.t3.small
- command: >-
aws secretsmanager update-secret --secret-id app-db/credentials
--description ''
description: Description removed from secret 'app-db/credentials'
success_criteria:
services:
- rds
- secretsmanager
state_checks:
- command: aws rds describe-db-instances --db-instance-identifier app-db
json_path: "$.DBInstances[0].MultiAZ"
expected: true
- command: aws rds describe-db-instances --db-instance-identifier app-db
json_path: "$.DBInstances[0].BackupRetentionPeriod"
expected: 7
- command: aws rds describe-db-instances --db-instance-identifier app-db
json_path: "$.DBInstances[0].DBInstanceClass"
expected: "db.t3.micro"
- command: aws secretsmanager describe-secret --secret-id app-db/credentials
output_contains: "Database credentials for app-db"
- task_id: 131
description: >
The following infrastructure should exist: ECS cluster 'web-cluster',
task definition 'web-task' (family web-task, container 'app' using
nginx:latest on port 80), ECS service 'web-service' with desired count 3.
IAM role 'ecs-task-role' with AmazonS3ReadOnlyAccess attached. Some
resources may have drifted. Audit the current state and fix any
configuration that does not match.
desired_state_spec: >
ECS cluster 'web-cluster', task definition 'web-task' (nginx:latest, port 80),
service 'web-service' desired count 3.
IAM role 'ecs-task-role': AmazonS3ReadOnlyAccess attached.
setup_commands:
- aws ecs create-cluster --cluster-name web-cluster
- >-
aws iam create-role --role-name ecs-task-role
--assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"ecs-tasks.amazonaws.com"},"Action":"sts:AssumeRole"}]}'
- >-
aws iam attach-role-policy --role-name ecs-task-role
--policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
- >-
aws ecs register-task-definition --family web-task
--container-definitions '[{"name":"app","image":"nginx:latest","portMappings":[{"containerPort":80}],"memory":256}]'
--task-role-arn arn:aws:iam::000000000000:role/ecs-task-role
- >-
aws ecs create-service --cluster web-cluster
--service-name web-service --task-definition web-task
--desired-count 3
possible_drifts:
- command: >-
aws ecs update-service --cluster web-cluster
--service web-service --desired-count 0
description: Service desired count changed from 3 to 0
- command: >-
aws iam detach-role-policy --role-name ecs-task-role
--policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
description: S3ReadOnlyAccess policy detached from 'ecs-task-role'
- command: >-
aws ecs update-service --cluster web-cluster
--service web-service --task-definition web-task
--desired-count 1
description: Service desired count changed from 3 to 1
success_criteria:
services:
- ecs
- iam
state_checks:
- command: aws ecs describe-services --cluster web-cluster --services web-service
json_path: "$.services[0].desiredCount"
expected: 3
- command: aws iam list-attached-role-policies --role-name ecs-task-role
output_contains: "S3ReadOnlyAccess"
- command: aws iam get-role --role-name ecs-task-role
output_contains: "ecs-task-role"
- command: aws ecs describe-clusters --clusters web-cluster
output_contains: "web-cluster"
- task_id: 133
description: >
The following infrastructure should exist: SSM parameter '/app/db-host'
(type String, value 'db.example.com'), SSM parameter '/app/db-port'
(type String, value '5432'). Lambda function 'config-reader' with 128MB
memory and 10s timeout. Some resources may have drifted. Audit the
current state and fix any configuration that does not match.
desired_state_spec: >
SSM '/app/db-host': String, 'db.example.com'.
SSM '/app/db-port': String, '5432'.
Lambda 'config-reader': 128MB memory, 10s timeout.
setup_commands:
- >-
aws ssm put-parameter --name /app/db-host
--type String --value db.example.com
- >-
aws ssm put-parameter --name /app/db-port
--type String --value 5432
- >-
aws iam create-role --role-name config-reader-role
--assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}'
- >-
aws lambda create-function --function-name config-reader
--runtime python3.12 --handler index.handler
--role arn:aws:iam::000000000000:role/config-reader-role
--code S3Bucket=dummy,S3Key=dummy.zip
--memory-size 128 --timeout 10
possible_drifts:
- command: >-
aws ssm put-parameter --name /app/db-host
--type String --value localhost --overwrite
description: SSM '/app/db-host' value changed to 'localhost'
- command: >-
aws ssm put-parameter --name /app/db-port
--type String --value 3306 --overwrite
description: SSM '/app/db-port' value changed to '3306'
- command: >-
aws lambda update-function-configuration --function-name config-reader
--memory-size 512
description: Lambda memory changed from 128MB to 512MB
- command: >-
aws lambda update-function-configuration --function-name config-reader
--timeout 60
description: Lambda timeout changed from 10s to 60s
- command: aws ssm delete-parameter --name /app/db-port
description: SSM parameter '/app/db-port' deleted
success_criteria:
services:
- ssm
- lambda
state_checks:
- command: aws ssm get-parameter --name /app/db-host
output_contains: "db.example.com"
- command: aws ssm get-parameter --name /app/db-port
output_contains: "5432"
- command: aws lambda get-function-configuration --function-name config-reader
json_path: "$.MemorySize"
expected: 128
- command: aws lambda get-function-configuration --function-name config-reader
json_path: "$.Timeout"
expected: 10
- task_id: 134
description: >
The following infrastructure should exist: EventBridge rule
'nightly-cleanup' with schedule expression 'rate(1 day)' in enabled
state, targeting Lambda function 'cleanup-handler'. Lambda
'cleanup-handler' with 256MB memory and 300s timeout. Some resources
may have drifted. Audit the current state and fix any configuration
that does not match.
desired_state_spec: >
EventBridge rule 'nightly-cleanup': schedule 'rate(1 day)', ENABLED.
Lambda 'cleanup-handler': 256MB memory, 300s timeout, target of rule.
setup_commands:
- >-
aws iam create-role --role-name cleanup-handler-role
--assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}'
- >-
aws lambda create-function --function-name cleanup-handler
--runtime python3.12 --handler index.handler
--role arn:aws:iam::000000000000:role/cleanup-handler-role
--code S3Bucket=dummy,S3Key=dummy.zip
--memory-size 256 --timeout 300
- >-
aws events put-rule --name nightly-cleanup
--schedule-expression 'rate(1 day)' --state ENABLED
- >-
aws events put-targets --rule nightly-cleanup
--targets '[{"Id":"cleanup-target","Arn":"arn:aws:lambda:us-east-1:000000000000:function:cleanup-handler"}]'
possible_drifts:
- command: aws events disable-rule --name nightly-cleanup
description: EventBridge rule 'nightly-cleanup' disabled
- command: >-
aws events put-rule --name nightly-cleanup
--schedule-expression 'rate(7 days)' --state ENABLED
description: Schedule changed from 'rate(1 day)' to 'rate(7 days)'
- command: >-
aws events remove-targets --rule nightly-cleanup
--ids cleanup-target
description: Lambda target removed from rule 'nightly-cleanup'
- command: >-
aws lambda update-function-configuration --function-name cleanup-handler
--timeout 30
description: Lambda timeout changed from 300s to 30s
- command: >-
aws lambda update-function-configuration --function-name cleanup-handler
--memory-size 128
description: Lambda memory changed from 256MB to 128MB
success_criteria:
services:
- events
- lambda
state_checks:
- command: aws events describe-rule --name nightly-cleanup
output_contains: "ENABLED"
- command: aws events describe-rule --name nightly-cleanup
output_contains: "rate(1 day)"
- command: aws events list-targets-by-rule --rule nightly-cleanup
output_contains: "cleanup-handler"
- command: aws lambda get-function-configuration --function-name cleanup-handler
json_path: "$.MemorySize"
expected: 256
- command: aws lambda get-function-configuration --function-name cleanup-handler
json_path: "$.Timeout"
expected: 300
- task_id: 135
description: >
The following infrastructure should exist: S3 bucket 'analytics-raw' with
versioning enabled and AES256 server-side encryption. Firehose delivery
stream 'clickstream-firehose' delivering to 'analytics-raw' with prefix
'raw/' and buffer size of 5 MiB. Some resources may have drifted. Audit
the current state and fix any configuration that does not match.
desired_state_spec: >
S3 'analytics-raw': versioning=Enabled, SSE with AES256.
Firehose 'clickstream-firehose': destination analytics-raw, prefix 'raw/',
buffer 5 MiB.
setup_commands:
- aws s3api create-bucket --bucket analytics-raw
- >-
aws s3api put-bucket-versioning --bucket analytics-raw
--versioning-configuration Status=Enabled
- >-
aws s3api put-bucket-encryption --bucket analytics-raw
--server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}'
- >-
aws firehose create-delivery-stream --delivery-stream-name clickstream-firehose
--s3-destination-configuration '{"RoleARN":"arn:aws:iam::000000000000:role/firehose-role","BucketARN":"arn:aws:s3:::analytics-raw","Prefix":"raw/","BufferingHints":{"SizeInMBs":5,"IntervalInSeconds":300}}'
possible_drifts:
- command: >-
aws s3api put-bucket-versioning --bucket analytics-raw
--versioning-configuration Status=Suspended
description: Versioning suspended on 'analytics-raw'
- command: aws s3api delete-bucket-encryption --bucket analytics-raw
description: Encryption removed from 'analytics-raw'
success_criteria:
services:
- firehose
- s3
state_checks:
- command: aws s3api get-bucket-versioning --bucket analytics-raw
output_contains: "Enabled"
- command: aws s3api get-bucket-encryption --bucket analytics-raw
output_contains: "AES256"
- command: aws firehose describe-delivery-stream --delivery-stream-name clickstream-firehose
output_contains: "raw/"
- command: aws firehose describe-delivery-stream --delivery-stream-name clickstream-firehose
output_contains: "analytics-raw"
- task_id: 139
description: >
The following infrastructure should exist: DynamoDB table 'users' with
provisioned throughput of 50 RCU and 50 WCU. DynamoDB table 'transactions'
with provisioned throughput of 100 RCU and 100 WCU, and a global secondary
index 'date-index' on the 'date' attribute provisioned at 100 RCU / 100 WCU.
Some resources may have drifted from the desired specification. Audit the
current state and fix any configuration that does not match.
desired_state_spec: >
DynamoDB 'users': 50 RCU, 50 WCU.
DynamoDB 'transactions': 100 RCU, 100 WCU, GSI 'date-index' at 100 RCU / 100 WCU.
setup_commands:
- >-
aws dynamodb create-table --table-name users
--attribute-definitions AttributeName=id,AttributeType=S
--key-schema AttributeName=id,KeyType=HASH
--provisioned-throughput ReadCapacityUnits=50,WriteCapacityUnits=50
- >-
aws dynamodb create-table --table-name transactions
--attribute-definitions AttributeName=id,AttributeType=S AttributeName=date,AttributeType=S
--key-schema AttributeName=id,KeyType=HASH
--provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=100
--global-secondary-indexes '[{"IndexName":"date-index","KeySchema":[{"AttributeName":"date","KeyType":"HASH"}],"Projection":{"ProjectionType":"ALL"},"ProvisionedThroughput":{"ReadCapacityUnits":100,"WriteCapacityUnits":100}}]'
possible_drifts:
- command: >-
aws dynamodb update-table --table-name users
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=50
description: Users table RCU reduced to 5
- command: >-
aws dynamodb update-table --table-name users
--provisioned-throughput ReadCapacityUnits=50,WriteCapacityUnits=5
description: Users table WCU reduced to 5
- command: >-
aws dynamodb update-table --table-name transactions
--provisioned-throughput ReadCapacityUnits=10,WriteCapacityUnits=100
description: Transactions table RCU reduced to 10
- command: >-
aws dynamodb update-table --table-name transactions
--provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=10
description: Transactions table WCU reduced to 10
- command: >-
aws dynamodb update-table --table-name transactions
--global-secondary-index-updates '[{"Update":{"IndexName":"date-index","ProvisionedThroughput":{"ReadCapacityUnits":5,"WriteCapacityUnits":5}}}]'
description: GSI 'date-index' throughput reduced to 5 RCU / 5 WCU
success_criteria:
services:
- dynamodb
state_checks:
- command: aws dynamodb describe-table --table-name users
json_path: "$.Table.ProvisionedThroughput.ReadCapacityUnits"
expected: 50
- command: aws dynamodb describe-table --table-name users
json_path: "$.Table.ProvisionedThroughput.WriteCapacityUnits"
expected: 50
- command: aws dynamodb describe-table --table-name transactions
json_path: "$.Table.ProvisionedThroughput.ReadCapacityUnits"
expected: 100
- command: aws dynamodb describe-table --table-name transactions
json_path: "$.Table.ProvisionedThroughput.WriteCapacityUnits"
expected: 100
- command: aws dynamodb describe-table --table-name transactions
json_path: "$.Table.GlobalSecondaryIndexes[0].ProvisionedThroughput.ReadCapacityUnits"
expected: 100