Tutorial 01: State is Truth
Learn why Terraform's state file is more important than your HCL code and what happens when it disappears
Brutal Truth Up Front
Terraform is not a deployment tool. It’s a state reconciliation engine that happens to provision infrastructure. Most tutorials skip this fundamental truth and jump straight to “let’s create some resources!” That’s backwards. If you don’t understand state management, you’ll create production disasters.
This tutorial forces you to break your state file intentionally, then recover from it. You’ll learn more from this controlled failure than from 10 successful deploys.
Prerequisites
- AWS Account with credentials configured (
aws configure) - Terraform installed (1.0+ recommended)
- Basic command line comfort
If you’re on Windows, use WSL2 or Git Bash. PowerShell works but adds unnecessary friction.
What You’ll Build
A single S3 bucket. That’s it. No VPCs, no EC2 instances, no complexity. We’re learning state mechanics, not AWS architecture.
The Exercise
Step 1: Create Your First Terraform Configuration
Create a working directory and a single file:
mkdir terraform-state-tutorial
cd terraform-state-tutorial
Create main.tf:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
resource "aws_s3_bucket" "example" {
bucket = "terraform-tutorial-${random_id.bucket_suffix.hex}"
tags = {
Purpose = "Learning Terraform State"
ManagedBy = "Terraform"
}
}
resource "random_id" "bucket_suffix" {
byte_length = 4
}
output "bucket_name" {
value = aws_s3_bucket.example.bucket
}
output "bucket_arn" {
value = aws_s3_bucket.example.arn
}
Step 2: Initialize and Apply
terraform init
You’ll see Terraform download the AWS and random providers. This creates a .terraform/ directory and .terraform.lock.hcl file. The lock file pins provider versions - commit it to version control.
Now plan:
terraform plan
Read the output carefully. Terraform is telling you:
- It will create 2 resources
- What attributes those resources will have
- Which attributes are “known after apply” (can’t be determined until AWS responds)
Apply:
terraform apply
Type yes when prompted. Terraform creates your resources and writes terraform.tfstate.
Step 3: Inspect the State File
cat terraform.tfstate
This JSON file contains:
- Resource IDs AWS assigned (bucket name, ARN)
- Current attribute values
- Resource dependencies
- Metadata about the Terraform version and providers
Critical insight: Your main.tf describes what you want. The state file describes what exists. Terraform compares them to determine what actions to take.
Step 4: Make a Change
Edit main.tf and add another tag:
resource "aws_s3_bucket" "example" {
bucket = "terraform-tutorial-${random_id.bucket_suffix.hex}"
tags = {
Purpose = "Learning Terraform State"
ManagedBy = "Terraform"
Environment = "Dev" # New tag
}
}
Run plan:
terraform plan
Terraform shows it will update in-place. It’s not destroying the bucket because it knows the bucket exists (from state) and can modify tags without replacement.
Apply the change:
terraform apply
The Break (Intentional Failure Scenario)
Now we deliberately break things to understand state’s role.
Scenario 1: Delete the State File
rm terraform.tfstate
terraform plan
What happens? Terraform shows it will create 2 new resources. Why? It has no memory that resources exist. Without state, Terraform assumes nothing exists.
Try to apply:
terraform apply
It fails! The bucket already exists, so AWS rejects the creation. You’re now in a broken state: resources exist in AWS, but Terraform doesn’t know about them.
Scenario 2: Manual Modification
First, restore your state from backup:
mv terraform.tfstate.backup terraform.tfstate
Now manually modify the bucket in the AWS console:
- Go to S3
- Find your bucket
- Add a tag:
ManualChange = "NotGood"
Back in your terminal:
terraform plan
Terraform doesn’t detect the manual change! Why? By default, Terraform doesn’t refresh state during plan. It trusts the state file.
Run with refresh:
terraform plan -refresh-only
Now Terraform detects drift: the real infrastructure doesn’t match the state file. It prompts you to update state.
The Recovery
Recovering from Deleted State
If you’ve lost your state file and resources exist, you have two options:
Option 1: Import (correct approach)
# Delete the resources first to start clean
terraform destroy # This will fail, that's expected
# Remove from state (since state is already empty)
terraform state list # Shows nothing
# Create a new state file by importing
terraform import aws_s3_bucket.example YOUR-BUCKET-NAME
terraform import random_id.bucket_suffix YOUR-RANDOM-ID
Importing is tedious. This is why you never lose state files in production.
Option 2: Destroy manually and recreate (nuclear option)
# Delete bucket in AWS console
# Then apply fresh
terraform apply
Handling Drift
When manual changes create drift:
# Refresh state to match reality
terraform apply -refresh-only
# Then update your code to match (if you want to keep the change)
# Or apply to revert to desired state
terraform apply
In production, you typically want Terraform to be the source of truth, so you’d apply to revert manual changes.
Exit Criteria
You understand this tutorial if you can:
- Explain why deleting
terraform.tfstatemakes Terraform think nothing exists - Predict whether a code change will update-in-place or force replacement
- Describe what happens when someone manually modifies infrastructure outside Terraform
- Recover from a lost state file using import
Key Lessons
- State is the source of truth, not your HCL code
- Plan compares desired state (code) vs current state (state file) vs actual infrastructure
- State loss is catastrophic in production - always use remote state with backups
- Manual changes create drift - Terraform won’t detect them unless you explicitly refresh
- State contains sensitive data - treat it like production credentials
Why This Matters in Production
In a real FedRAMP High environment, losing state means you can’t safely make changes without risking downtime. You don’t know what Terraform manages vs what was created manually. State files in S3 with versioning enabled and DynamoDB state locking prevent these scenarios.
Every production outage I’ve seen related to Terraform involved state issues:
- Corrupted state from concurrent applies
- Lost state files from local development
- Drift from manual “emergency” changes that were never reverted
Master state management before touching production.
Next Steps
Tutorial 02: The Plan-Apply Contract - Learn to predict resource replacement vs update-in-place by understanding Terraform’s diff algorithm.
Cleanup
terraform destroy
This removes your resources. Notice how Terraform knows what to destroy - it reads from state.
Additional Resources
Keywords
Need Help Implementing This?
I help government contractors and defense organizations modernize their infrastructure using Terraform and AWS GovCloud. With 15+ years managing DoD systems and active Secret clearance, I understand compliance requirements that commercial consultants miss.