Tutorial 01: State is Truth

Brutal Truth Up Front

Terraform is not a deployment tool. It’s a state reconciliation engine that happens to provision infrastructure. Most tutorials skip this fundamental truth and jump straight to “let’s create some resources!” That’s backwards. If you don’t understand state management, you’ll create production disasters.

This tutorial forces you to break your state file intentionally, then recover from it. You’ll learn more from this controlled failure than from 10 successful deploys.

Prerequisites

AWS Account with credentials configured (aws configure)
Terraform installed (1.0+ recommended)
Basic command line comfort

If you’re on Windows, use WSL2 or Git Bash. PowerShell works but adds unnecessary friction.

What You’ll Build

A single S3 bucket. That’s it. No VPCs, no EC2 instances, no complexity. We’re learning state mechanics, not AWS architecture.

The Exercise

Step 1: Create Your First Terraform Configuration

Create a working directory and a single file:

mkdir terraform-state-tutorial
cd terraform-state-tutorial

Create main.tf:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

resource "aws_s3_bucket" "example" {
  bucket = "terraform-tutorial-${random_id.bucket_suffix.hex}"
  
  tags = {
    Purpose     = "Learning Terraform State"
    ManagedBy   = "Terraform"
  }
}

resource "random_id" "bucket_suffix" {
  byte_length = 4
}

output "bucket_name" {
  value = aws_s3_bucket.example.bucket
}

output "bucket_arn" {
  value = aws_s3_bucket.example.arn
}

Step 2: Initialize and Apply

terraform init

You’ll see Terraform download the AWS and random providers. This creates a .terraform/ directory and .terraform.lock.hcl file. The lock file pins provider versions - commit it to version control.

Now plan:

terraform plan

Read the output carefully. Terraform is telling you:

It will create 2 resources
What attributes those resources will have
Which attributes are “known after apply” (can’t be determined until AWS responds)

Apply:

terraform apply

Type yes when prompted. Terraform creates your resources and writes terraform.tfstate.

Step 3: Inspect the State File

cat terraform.tfstate

This JSON file contains:

Resource IDs AWS assigned (bucket name, ARN)
Current attribute values
Resource dependencies
Metadata about the Terraform version and providers

Critical insight: Your main.tf describes what you want. The state file describes what exists. Terraform compares them to determine what actions to take.

Step 4: Make a Change

Edit main.tf and add another tag:

resource "aws_s3_bucket" "example" {
  bucket = "terraform-tutorial-${random_id.bucket_suffix.hex}"
  
  tags = {
    Purpose     = "Learning Terraform State"
    ManagedBy   = "Terraform"
    Environment = "Dev"  # New tag
  }
}

Run plan:

terraform plan

Terraform shows it will update in-place. It’s not destroying the bucket because it knows the bucket exists (from state) and can modify tags without replacement.

Apply the change:

terraform apply

The Break (Intentional Failure Scenario)

Now we deliberately break things to understand state’s role.

Scenario 1: Delete the State File

rm terraform.tfstate
terraform plan

What happens? Terraform shows it will create 2 new resources. Why? It has no memory that resources exist. Without state, Terraform assumes nothing exists.

Try to apply:

terraform apply

It fails! The bucket already exists, so AWS rejects the creation. You’re now in a broken state: resources exist in AWS, but Terraform doesn’t know about them.

Scenario 2: Manual Modification

First, restore your state from backup:

mv terraform.tfstate.backup terraform.tfstate

Now manually modify the bucket in the AWS console:

Go to S3
Find your bucket
Add a tag: ManualChange = "NotGood"

Back in your terminal:

terraform plan

Terraform doesn’t detect the manual change! Why? By default, Terraform doesn’t refresh state during plan. It trusts the state file.

Run with refresh:

terraform plan -refresh-only

Now Terraform detects drift: the real infrastructure doesn’t match the state file. It prompts you to update state.

The Recovery

Recovering from Deleted State

If you’ve lost your state file and resources exist, you have two options:

Option 1: Import (correct approach)

# Delete the resources first to start clean
terraform destroy  # This will fail, that's expected

# Remove from state (since state is already empty)
terraform state list  # Shows nothing

# Create a new state file by importing
terraform import aws_s3_bucket.example YOUR-BUCKET-NAME
terraform import random_id.bucket_suffix YOUR-RANDOM-ID

Importing is tedious. This is why you never lose state files in production.

Option 2: Destroy manually and recreate (nuclear option)

# Delete bucket in AWS console
# Then apply fresh
terraform apply

Handling Drift

When manual changes create drift:

# Refresh state to match reality
terraform apply -refresh-only

# Then update your code to match (if you want to keep the change)
# Or apply to revert to desired state
terraform apply

In production, you typically want Terraform to be the source of truth, so you’d apply to revert manual changes.

Exit Criteria

You understand this tutorial if you can:

Explain why deleting terraform.tfstate makes Terraform think nothing exists
Predict whether a code change will update-in-place or force replacement
Describe what happens when someone manually modifies infrastructure outside Terraform
Recover from a lost state file using import

Key Lessons

State is the source of truth, not your HCL code
Plan compares desired state (code) vs current state (state file) vs actual infrastructure
State loss is catastrophic in production - always use remote state with backups
Manual changes create drift - Terraform won’t detect them unless you explicitly refresh
State contains sensitive data - treat it like production credentials

Why This Matters in Production

In a real FedRAMP High environment, losing state means you can’t safely make changes without risking downtime. You don’t know what Terraform manages vs what was created manually. State files in S3 with versioning enabled and DynamoDB state locking prevent these scenarios.

Every production outage I’ve seen related to Terraform involved state issues:

Corrupted state from concurrent applies
Lost state files from local development
Drift from manual “emergency” changes that were never reverted

Master state management before touching production.

Next Steps

Tutorial 02: The Plan-Apply Contract - Learn to predict resource replacement vs update-in-place by understanding Terraform’s diff algorithm.

Cleanup

terraform destroy

This removes your resources. Notice how Terraform knows what to destroy - it reads from state.