Services

Resources

Company

Our Work

Blog

Book a Call

Back to Blog

#Infrastructure As Code

#How To

Sep 24, 2025 | 7 min read

What happens when you run Terraform commands?

Sahil Joseph

SRE @One2N

Saurabh Hirani

Principal SRE @One2N

Back to Blog

#Infrastructure As Code

#How To

Sep 24, 2025 | 7 min read

What happens when you run Terraform commands?

Sahil Joseph

SRE @One2N

Saurabh Hirani

Principal SRE @One2N

Back to Blog

#Infrastructure As Code

#How To

Sep 24, 2025 | 7 min read

What happens when you run Terraform commands?

Sahil Joseph

SRE @One2N

Saurabh Hirani

Principal SRE @One2N

Back to Blog

#Infrastructure As Code

#How To

Sep 24, 2025 | 7 min read

What happens when you run Terraform commands?

Sahil Joseph

SRE @One2N

Saurabh Hirani

Principal SRE @One2N

Have you ever wondered what happens when you run Terraform commands?

In this post, we will explore the following topics to gain a better understanding of Terraform internals:

Architecture and provider plugin architecture
Credential management
State locking
Debugging

Terraform architecture

Terraform uses a modular plugin-based architecture. This makes its core engine (written in Go) lightweight, while Terraform plugins do the heavy lifting of communicating with cloud provider APIs. These plugins are maintained by the respective cloud vendors e.g. AWS, GCP, etc.

But how does Terraform core communicate with these plugins?

gRPC: The bridge between core and plugins

Terraform core and provider plugins communicate through gRPC (Google Remote Procedure call). Its usage offers multiple advantages:

Performance: gRPC is designed for low-latency, bi-directional communication. This improves the performance of Terraform plan and apply commands.
Standardization: gRPC uses protocol buffers (protobuf) to serialize data. Protobuf defines the message schemas, while gRPC handles transport, making interactions reliable and efficient.
Separation of concerns: gRPC promotes language-neutral communication between Terraform core and plugins. This means that plugins can be written in languages other than Go.
Extensibility: Providers are binaries registered via gRPC and triggered by Terraform core. Adding a new provider is as simple as creating the plugin without touching the core engine.

This design keeps Terraform flexible and future-proof, letting the community build new providers without bloating the core.

The simplicity and modularity of Terraform plugin architecture is evident in cases where organizations have built Terraform providers beyond cloud infra use cases e.g. Datadog provider for setting alerts, Grafana provider managing Grafana operations, etc. Interestingly, some contributors have pushed the limits and even wrote providers to order Pizza. Check it out here.

The following diagram illustrates how Terraform core delegates cloud-specific tasks to provider plugins through gRPC.

Fig 1. Terraform core delegates cloud-specific tasks to provider plugins via gRPC.

Backend interface: Support storing Terraform state and handling operations like apply and plan in a team setting.
DAG builder: Terraform core manages resource dependencies by creating a Directed Acyclic Graph (DAG). This ensures that infrastructure operations are ordered and predictable while leveraging parallelism where possible.
gPRC client and Provider plugins: External binaries that interface with cloud specific APIs.

Let us look at the DAG builder workflow in more detail to understand how Terraform leverages dependency graphs to make infrastructure operations predictable and performant.

DAG builder workflow

Terraform core constructs a DAG to represent the relationships and dependencies between resources. This ensures resources are created, modified, or destroyed in the correct order. For example, a security group must be created before an EC2 instance that uses it. The DAG allows Terraform to:

Parallelize independent resource operations
Detect circular dependencies
Maintain correct resource creation order

Each node in the DAG is a resource and edges define dependency relationships.

DAG visual representation

Fig 2. DAG visual representation

We need to provision aws_instance.web, which depends on the following resources:

aws_vpc.main — the VPC where the instance will run.
aws_security_group.web — the security group to be attached to the instance.
aws_iam_role.app — the IAM role associated with the instance.

The DAG flow ensures that the pre-requisite infrastructure resources are created before the AWS instance.

This process leads to efficient and deterministic infrastructure provisioning.

What happens when you run Terraform apply?

Now that we have a basic overview of how Terraform’s architecture works, let’s look at what happens when you run the Terraform apply command.

The 3 common commands that any Terraform user runs when provisioning infrastructure are:

init - initialize Terraform - downloads provider plugins and does initial housekeeping tasks.
plan - dry run - describe the operation being performed.
apply - run the operation.

We will understand the apply command in depth by using a simple aws_instance resource example:

resource "aws_instance" "web" {
  ami           = "ami-123456"
  instance_type = "t2.micro"
}

Terraform uses HCL (Hashicorp configuration Language) to write Terraform code.

The above resource creation goes through these steps:

Fig 3. Terraform resource creation

Parsing the Configuration: Terraform Core reads your HCL file. Assuming the init call is done, the provider plugins are present to communicate with the provider. After this step, Terraform parses the resource, figures out the type (aws_instance), the name (web), and all the attributes you’ve set (ami and instance_type).
Building the Dependency Graph (DAG): Terraform automatically builds a DAG of resources, inferring dependencies sometimes even ones you didn’t explicitly declare. As seen earlier, each resource becomes a node, and dependencies form the edges.
Provider Interaction: The AWS provider steps in as it is registered during the init call. It takes your resource definition and translates it into an AWS specific API call. For an EC2 instance, this means preparing a request for the AWS RunInstances API.
Making the API Call: Using your credentials (from environment variables, config files, or elsewhere - we will touch upon this topic later), the provider sends a request like this.
POST https://ec2.amazonaws.com/?Action=RunInstances { "ImageId": "ami-123456", "InstanceType": "t2.micro", ...other default parameters... }
Handling the Response Terraform handles the contract between the API call and what it shows the end user. If the call goes through successfully, Terraform parses the response metadata and extract instance properties like instance_id, public and private IPs, and more.
Updating State Whenever any resource operation occurs, Terraform’s internal state changes. This change needs to preserved for the next run. Depending on where you maintain state (local or remote), Terraform updates all the relevant attributes in the state file under aws_instance.web. This state is crucial as it lets Terraform track what’s been created, detect drift, and reference attributes in other resources.

We assumed that Terraform can talk to the cloud provider through credentials stored somewhere. Let us take the example of AWS and see the different ways in which Terraform credentials are managed. Similar setups exist for other cloud providers.

Credential management

Terraform does not store credentials in the state file. They are passed securely at runtime using the following precedence:

Environment Variables:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
Credentials Files:
- ~/.aws/credentials
Other methods
Rather than storing permanent credentials on disk, there are more secure methods to obtain temporary credentials like AWS STS assume role.

Terraform State locking

Terraform can maintain state locally or remotely. Local state involves writing state to an on-disk file. This is not recommended as it keeps central cloud changes on someone’s laptop. This can be prevented by using a central remote storage like s3.

Using a central storage brings up another problem - What if multiple users of the same team update the state in parallel? This could corrupt the state leading to the last update overwriting any previous versions.

AWS S3 has a versioning feature which versions each change. However, this is useful for auditing who made the last changes but it doesn’t prevent parallel changes.

To address this challenge, Terraform has a feature to enable state locking through services like AWS DynamoDB.

Locking ensures that only one user can modify the state at any given time.

How does it work?

In DynamoDB, Terraform stores a lock with a unique LockID.
When a Terraform operation (like apply or destroy) starts, it attempts to acquire the lock.
If another operation is in progress and the lock is already held, Terraform will wait (and retry) until the lock is released.
Other cloud providers have DynamoDB equivalent e.g. GCP has Cloud storage object locks. The implementation details differ but the concept is the same - Whoever gets the lock, updates the state.

Example configuration

terraform {
  backend "s3" {
    bucket         = "my-tf-state-bucket"
    key            = "env/dev/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Benefits of using state locking

Prevents race conditions and state corruption.
Enforces serialisation of infrastructure changes.

This setup is critical for production environments or any setup where multiple users or automated systems are managing infrastructure via Terraform.

Debugging Terraform

As we wrap up this overview of Terraform architecture and core concepts, it's important to know how to troubleshoot when things don’t go as planned. That’s where Terraform debugging comes in.

Whether you're dealing with unexpected plan results, authentication errors, or strange provider behavior, Terraform’s detailed structured logging helps you understand the underlying issues.

TF_LOG Levels

Terraform exposes different logging levels you can use to control the verbosity of output during command execution:

TRACE – Most detailed (everything).
DEBUG – Useful internal information (e.g., API calls, provider logic).
INFO – General progress messages.
WARN – Non-critical issues or warnings.
ERROR – Only errors that stop execution.

How to enable debug mode

You can set the log level with the TF_LOG environment variable:

TF_LOG

To avoid cluttering your terminal or to keep an on-disk record of logs, use the TF_LOG_PATH variable:

TF_LOG=DEBUG TF_LOG_PATH

What will you see in logs?

Terraform graph construction and execution.
Provider plugin handshake details.
Credential loading and authentication steps.
Cloud API requests and responses.
Backend configuration and state interactions.

End to end workflow diagram

Throughout this guide, we’ve explored how Terraform works under the hood - starting with how it parses .tf configuration files, builds a dependency graph (DAG), interacts with provider plugins via gRPC, and communicates with cloud APIs to provision infrastructure.

Now, to tie everything together, here’s a high-level diagram that illustrates Terraform’s end-to-end workflow, from reading your code to updating the state file:

Fig 4. Terraform workflow diagram

This workflow is at the heart of how Terraform delivers reliable, repeatable infrastructure automation across any cloud provider.

With this full picture in mind, you're well-equipped to not only write Terraform code but also understand how and why it works behind the scenes.

We have a team of experts who can help you streamline your infrastructure operations, SRE and Platform engineering initiatives. Reach out to us here

Have you ever wondered what happens when you run Terraform commands?

In this post, we will explore the following topics to gain a better understanding of Terraform internals:

Architecture and provider plugin architecture
Credential management
State locking
Debugging

Terraform architecture

But how does Terraform core communicate with these plugins?

gRPC: The bridge between core and plugins

Terraform core and provider plugins communicate through gRPC (Google Remote Procedure call). Its usage offers multiple advantages:

Performance: gRPC is designed for low-latency, bi-directional communication. This improves the performance of Terraform plan and apply commands.
Standardization: gRPC uses protocol buffers (protobuf) to serialize data. Protobuf defines the message schemas, while gRPC handles transport, making interactions reliable and efficient.
Separation of concerns: gRPC promotes language-neutral communication between Terraform core and plugins. This means that plugins can be written in languages other than Go.
Extensibility: Providers are binaries registered via gRPC and triggered by Terraform core. Adding a new provider is as simple as creating the plugin without touching the core engine.

This design keeps Terraform flexible and future-proof, letting the community build new providers without bloating the core.

The following diagram illustrates how Terraform core delegates cloud-specific tasks to provider plugins through gRPC.

Fig 1. Terraform core delegates cloud-specific tasks to provider plugins via gRPC.

Backend interface: Support storing Terraform state and handling operations like apply and plan in a team setting.
DAG builder: Terraform core manages resource dependencies by creating a Directed Acyclic Graph (DAG). This ensures that infrastructure operations are ordered and predictable while leveraging parallelism where possible.
gPRC client and Provider plugins: External binaries that interface with cloud specific APIs.

Let us look at the DAG builder workflow in more detail to understand how Terraform leverages dependency graphs to make infrastructure operations predictable and performant.

DAG builder workflow

Parallelize independent resource operations
Detect circular dependencies
Maintain correct resource creation order

Each node in the DAG is a resource and edges define dependency relationships.

DAG visual representation

Fig 2. DAG visual representation

We need to provision aws_instance.web, which depends on the following resources:

aws_vpc.main — the VPC where the instance will run.
aws_security_group.web — the security group to be attached to the instance.
aws_iam_role.app — the IAM role associated with the instance.

The DAG flow ensures that the pre-requisite infrastructure resources are created before the AWS instance.

This process leads to efficient and deterministic infrastructure provisioning.

What happens when you run Terraform apply?

Now that we have a basic overview of how Terraform’s architecture works, let’s look at what happens when you run the Terraform apply command.

The 3 common commands that any Terraform user runs when provisioning infrastructure are:

init - initialize Terraform - downloads provider plugins and does initial housekeeping tasks.
plan - dry run - describe the operation being performed.
apply - run the operation.

We will understand the apply command in depth by using a simple aws_instance resource example:

resource "aws_instance" "web" {
  ami           = "ami-123456"
  instance_type = "t2.micro"
}

Terraform uses HCL (Hashicorp configuration Language) to write Terraform code.

The above resource creation goes through these steps:

Fig 3. Terraform resource creation

Parsing the Configuration: Terraform Core reads your HCL file. Assuming the init call is done, the provider plugins are present to communicate with the provider. After this step, Terraform parses the resource, figures out the type (aws_instance), the name (web), and all the attributes you’ve set (ami and instance_type).
Building the Dependency Graph (DAG): Terraform automatically builds a DAG of resources, inferring dependencies sometimes even ones you didn’t explicitly declare. As seen earlier, each resource becomes a node, and dependencies form the edges.
Provider Interaction: The AWS provider steps in as it is registered during the init call. It takes your resource definition and translates it into an AWS specific API call. For an EC2 instance, this means preparing a request for the AWS RunInstances API.
Making the API Call: Using your credentials (from environment variables, config files, or elsewhere - we will touch upon this topic later), the provider sends a request like this.
POST https://ec2.amazonaws.com/?Action=RunInstances { "ImageId": "ami-123456", "InstanceType": "t2.micro", ...other default parameters... }
Handling the Response Terraform handles the contract between the API call and what it shows the end user. If the call goes through successfully, Terraform parses the response metadata and extract instance properties like instance_id, public and private IPs, and more.
Updating State Whenever any resource operation occurs, Terraform’s internal state changes. This change needs to preserved for the next run. Depending on where you maintain state (local or remote), Terraform updates all the relevant attributes in the state file under aws_instance.web. This state is crucial as it lets Terraform track what’s been created, detect drift, and reference attributes in other resources.

Credential management

Terraform does not store credentials in the state file. They are passed securely at runtime using the following precedence:

Environment Variables:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
Credentials Files:
- ~/.aws/credentials
Other methods
Rather than storing permanent credentials on disk, there are more secure methods to obtain temporary credentials like AWS STS assume role.

Terraform State locking

AWS S3 has a versioning feature which versions each change. However, this is useful for auditing who made the last changes but it doesn’t prevent parallel changes.

To address this challenge, Terraform has a feature to enable state locking through services like AWS DynamoDB.

Locking ensures that only one user can modify the state at any given time.

How does it work?

In DynamoDB, Terraform stores a lock with a unique LockID.
When a Terraform operation (like apply or destroy) starts, it attempts to acquire the lock.
If another operation is in progress and the lock is already held, Terraform will wait (and retry) until the lock is released.
Other cloud providers have DynamoDB equivalent e.g. GCP has Cloud storage object locks. The implementation details differ but the concept is the same - Whoever gets the lock, updates the state.

Example configuration

terraform {
  backend "s3" {
    bucket         = "my-tf-state-bucket"
    key            = "env/dev/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Benefits of using state locking

Prevents race conditions and state corruption.
Enforces serialisation of infrastructure changes.

This setup is critical for production environments or any setup where multiple users or automated systems are managing infrastructure via Terraform.

Debugging Terraform

As we wrap up this overview of Terraform architecture and core concepts, it's important to know how to troubleshoot when things don’t go as planned. That’s where Terraform debugging comes in.

Whether you're dealing with unexpected plan results, authentication errors, or strange provider behavior, Terraform’s detailed structured logging helps you understand the underlying issues.

TF_LOG Levels

Terraform exposes different logging levels you can use to control the verbosity of output during command execution:

TRACE – Most detailed (everything).
DEBUG – Useful internal information (e.g., API calls, provider logic).
INFO – General progress messages.
WARN – Non-critical issues or warnings.
ERROR – Only errors that stop execution.

How to enable debug mode

You can set the log level with the TF_LOG environment variable:

TF_LOG

To avoid cluttering your terminal or to keep an on-disk record of logs, use the TF_LOG_PATH variable:

TF_LOG=DEBUG TF_LOG_PATH

What will you see in logs?

Terraform graph construction and execution.
Provider plugin handshake details.
Credential loading and authentication steps.
Cloud API requests and responses.
Backend configuration and state interactions.

End to end workflow diagram

Now, to tie everything together, here’s a high-level diagram that illustrates Terraform’s end-to-end workflow, from reading your code to updating the state file:

Fig 4. Terraform workflow diagram

This workflow is at the heart of how Terraform delivers reliable, repeatable infrastructure automation across any cloud provider.

With this full picture in mind, you're well-equipped to not only write Terraform code but also understand how and why it works behind the scenes.

We have a team of experts who can help you streamline your infrastructure operations, SRE and Platform engineering initiatives. Reach out to us here

Have you ever wondered what happens when you run Terraform commands?

In this post, we will explore the following topics to gain a better understanding of Terraform internals:

Architecture and provider plugin architecture
Credential management
State locking
Debugging

Terraform architecture

But how does Terraform core communicate with these plugins?

gRPC: The bridge between core and plugins

Terraform core and provider plugins communicate through gRPC (Google Remote Procedure call). Its usage offers multiple advantages:

Performance: gRPC is designed for low-latency, bi-directional communication. This improves the performance of Terraform plan and apply commands.
Standardization: gRPC uses protocol buffers (protobuf) to serialize data. Protobuf defines the message schemas, while gRPC handles transport, making interactions reliable and efficient.
Separation of concerns: gRPC promotes language-neutral communication between Terraform core and plugins. This means that plugins can be written in languages other than Go.
Extensibility: Providers are binaries registered via gRPC and triggered by Terraform core. Adding a new provider is as simple as creating the plugin without touching the core engine.

This design keeps Terraform flexible and future-proof, letting the community build new providers without bloating the core.

The following diagram illustrates how Terraform core delegates cloud-specific tasks to provider plugins through gRPC.

Fig 1. Terraform core delegates cloud-specific tasks to provider plugins via gRPC.

Backend interface: Support storing Terraform state and handling operations like apply and plan in a team setting.
DAG builder: Terraform core manages resource dependencies by creating a Directed Acyclic Graph (DAG). This ensures that infrastructure operations are ordered and predictable while leveraging parallelism where possible.
gPRC client and Provider plugins: External binaries that interface with cloud specific APIs.

Let us look at the DAG builder workflow in more detail to understand how Terraform leverages dependency graphs to make infrastructure operations predictable and performant.

DAG builder workflow

Parallelize independent resource operations
Detect circular dependencies
Maintain correct resource creation order

Each node in the DAG is a resource and edges define dependency relationships.

DAG visual representation

Fig 2. DAG visual representation

We need to provision aws_instance.web, which depends on the following resources:

aws_vpc.main — the VPC where the instance will run.
aws_security_group.web — the security group to be attached to the instance.
aws_iam_role.app — the IAM role associated with the instance.

The DAG flow ensures that the pre-requisite infrastructure resources are created before the AWS instance.

This process leads to efficient and deterministic infrastructure provisioning.

What happens when you run Terraform apply?

Now that we have a basic overview of how Terraform’s architecture works, let’s look at what happens when you run the Terraform apply command.

The 3 common commands that any Terraform user runs when provisioning infrastructure are:

init - initialize Terraform - downloads provider plugins and does initial housekeeping tasks.
plan - dry run - describe the operation being performed.
apply - run the operation.

We will understand the apply command in depth by using a simple aws_instance resource example:

resource "aws_instance" "web" {
  ami           = "ami-123456"
  instance_type = "t2.micro"
}

Terraform uses HCL (Hashicorp configuration Language) to write Terraform code.

The above resource creation goes through these steps:

Fig 3. Terraform resource creation

Parsing the Configuration: Terraform Core reads your HCL file. Assuming the init call is done, the provider plugins are present to communicate with the provider. After this step, Terraform parses the resource, figures out the type (aws_instance), the name (web), and all the attributes you’ve set (ami and instance_type).
Building the Dependency Graph (DAG): Terraform automatically builds a DAG of resources, inferring dependencies sometimes even ones you didn’t explicitly declare. As seen earlier, each resource becomes a node, and dependencies form the edges.
Provider Interaction: The AWS provider steps in as it is registered during the init call. It takes your resource definition and translates it into an AWS specific API call. For an EC2 instance, this means preparing a request for the AWS RunInstances API.
Making the API Call: Using your credentials (from environment variables, config files, or elsewhere - we will touch upon this topic later), the provider sends a request like this.
POST https://ec2.amazonaws.com/?Action=RunInstances { "ImageId": "ami-123456", "InstanceType": "t2.micro", ...other default parameters... }
Handling the Response Terraform handles the contract between the API call and what it shows the end user. If the call goes through successfully, Terraform parses the response metadata and extract instance properties like instance_id, public and private IPs, and more.
Updating State Whenever any resource operation occurs, Terraform’s internal state changes. This change needs to preserved for the next run. Depending on where you maintain state (local or remote), Terraform updates all the relevant attributes in the state file under aws_instance.web. This state is crucial as it lets Terraform track what’s been created, detect drift, and reference attributes in other resources.

Credential management

Terraform does not store credentials in the state file. They are passed securely at runtime using the following precedence:

Environment Variables:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
Credentials Files:
- ~/.aws/credentials
Other methods
Rather than storing permanent credentials on disk, there are more secure methods to obtain temporary credentials like AWS STS assume role.

Terraform State locking

AWS S3 has a versioning feature which versions each change. However, this is useful for auditing who made the last changes but it doesn’t prevent parallel changes.

To address this challenge, Terraform has a feature to enable state locking through services like AWS DynamoDB.

Locking ensures that only one user can modify the state at any given time.

How does it work?

In DynamoDB, Terraform stores a lock with a unique LockID.
When a Terraform operation (like apply or destroy) starts, it attempts to acquire the lock.
If another operation is in progress and the lock is already held, Terraform will wait (and retry) until the lock is released.
Other cloud providers have DynamoDB equivalent e.g. GCP has Cloud storage object locks. The implementation details differ but the concept is the same - Whoever gets the lock, updates the state.

Example configuration

terraform {
  backend "s3" {
    bucket         = "my-tf-state-bucket"
    key            = "env/dev/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Benefits of using state locking

Prevents race conditions and state corruption.
Enforces serialisation of infrastructure changes.

This setup is critical for production environments or any setup where multiple users or automated systems are managing infrastructure via Terraform.

Debugging Terraform

As we wrap up this overview of Terraform architecture and core concepts, it's important to know how to troubleshoot when things don’t go as planned. That’s where Terraform debugging comes in.

Whether you're dealing with unexpected plan results, authentication errors, or strange provider behavior, Terraform’s detailed structured logging helps you understand the underlying issues.

TF_LOG Levels

Terraform exposes different logging levels you can use to control the verbosity of output during command execution:

TRACE – Most detailed (everything).
DEBUG – Useful internal information (e.g., API calls, provider logic).
INFO – General progress messages.
WARN – Non-critical issues or warnings.
ERROR – Only errors that stop execution.

How to enable debug mode

You can set the log level with the TF_LOG environment variable:

TF_LOG

To avoid cluttering your terminal or to keep an on-disk record of logs, use the TF_LOG_PATH variable:

TF_LOG=DEBUG TF_LOG_PATH

What will you see in logs?

Terraform graph construction and execution.
Provider plugin handshake details.
Credential loading and authentication steps.
Cloud API requests and responses.
Backend configuration and state interactions.

End to end workflow diagram

Now, to tie everything together, here’s a high-level diagram that illustrates Terraform’s end-to-end workflow, from reading your code to updating the state file:

Fig 4. Terraform workflow diagram

This workflow is at the heart of how Terraform delivers reliable, repeatable infrastructure automation across any cloud provider.

With this full picture in mind, you're well-equipped to not only write Terraform code but also understand how and why it works behind the scenes.

We have a team of experts who can help you streamline your infrastructure operations, SRE and Platform engineering initiatives. Reach out to us here

Have you ever wondered what happens when you run Terraform commands?

In this post, we will explore the following topics to gain a better understanding of Terraform internals:

Architecture and provider plugin architecture
Credential management
State locking
Debugging

Terraform architecture

But how does Terraform core communicate with these plugins?

gRPC: The bridge between core and plugins

Terraform core and provider plugins communicate through gRPC (Google Remote Procedure call). Its usage offers multiple advantages:

Performance: gRPC is designed for low-latency, bi-directional communication. This improves the performance of Terraform plan and apply commands.
Standardization: gRPC uses protocol buffers (protobuf) to serialize data. Protobuf defines the message schemas, while gRPC handles transport, making interactions reliable and efficient.
Separation of concerns: gRPC promotes language-neutral communication between Terraform core and plugins. This means that plugins can be written in languages other than Go.
Extensibility: Providers are binaries registered via gRPC and triggered by Terraform core. Adding a new provider is as simple as creating the plugin without touching the core engine.

This design keeps Terraform flexible and future-proof, letting the community build new providers without bloating the core.

The following diagram illustrates how Terraform core delegates cloud-specific tasks to provider plugins through gRPC.

Fig 1. Terraform core delegates cloud-specific tasks to provider plugins via gRPC.

Backend interface: Support storing Terraform state and handling operations like apply and plan in a team setting.
DAG builder: Terraform core manages resource dependencies by creating a Directed Acyclic Graph (DAG). This ensures that infrastructure operations are ordered and predictable while leveraging parallelism where possible.
gPRC client and Provider plugins: External binaries that interface with cloud specific APIs.

Let us look at the DAG builder workflow in more detail to understand how Terraform leverages dependency graphs to make infrastructure operations predictable and performant.

DAG builder workflow

Parallelize independent resource operations
Detect circular dependencies
Maintain correct resource creation order

Each node in the DAG is a resource and edges define dependency relationships.

DAG visual representation

Fig 2. DAG visual representation

We need to provision aws_instance.web, which depends on the following resources:

aws_vpc.main — the VPC where the instance will run.
aws_security_group.web — the security group to be attached to the instance.
aws_iam_role.app — the IAM role associated with the instance.

The DAG flow ensures that the pre-requisite infrastructure resources are created before the AWS instance.

This process leads to efficient and deterministic infrastructure provisioning.

What happens when you run Terraform apply?

Now that we have a basic overview of how Terraform’s architecture works, let’s look at what happens when you run the Terraform apply command.

The 3 common commands that any Terraform user runs when provisioning infrastructure are:

init - initialize Terraform - downloads provider plugins and does initial housekeeping tasks.
plan - dry run - describe the operation being performed.
apply - run the operation.

We will understand the apply command in depth by using a simple aws_instance resource example:

resource "aws_instance" "web" {
  ami           = "ami-123456"
  instance_type = "t2.micro"
}

Terraform uses HCL (Hashicorp configuration Language) to write Terraform code.

The above resource creation goes through these steps:

Fig 3. Terraform resource creation

Parsing the Configuration: Terraform Core reads your HCL file. Assuming the init call is done, the provider plugins are present to communicate with the provider. After this step, Terraform parses the resource, figures out the type (aws_instance), the name (web), and all the attributes you’ve set (ami and instance_type).
Building the Dependency Graph (DAG): Terraform automatically builds a DAG of resources, inferring dependencies sometimes even ones you didn’t explicitly declare. As seen earlier, each resource becomes a node, and dependencies form the edges.
Provider Interaction: The AWS provider steps in as it is registered during the init call. It takes your resource definition and translates it into an AWS specific API call. For an EC2 instance, this means preparing a request for the AWS RunInstances API.
Making the API Call: Using your credentials (from environment variables, config files, or elsewhere - we will touch upon this topic later), the provider sends a request like this.
POST https://ec2.amazonaws.com/?Action=RunInstances { "ImageId": "ami-123456", "InstanceType": "t2.micro", ...other default parameters... }
Handling the Response Terraform handles the contract between the API call and what it shows the end user. If the call goes through successfully, Terraform parses the response metadata and extract instance properties like instance_id, public and private IPs, and more.
Updating State Whenever any resource operation occurs, Terraform’s internal state changes. This change needs to preserved for the next run. Depending on where you maintain state (local or remote), Terraform updates all the relevant attributes in the state file under aws_instance.web. This state is crucial as it lets Terraform track what’s been created, detect drift, and reference attributes in other resources.

Credential management

Terraform does not store credentials in the state file. They are passed securely at runtime using the following precedence:

Environment Variables:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
Credentials Files:
- ~/.aws/credentials
Other methods
Rather than storing permanent credentials on disk, there are more secure methods to obtain temporary credentials like AWS STS assume role.

Terraform State locking

AWS S3 has a versioning feature which versions each change. However, this is useful for auditing who made the last changes but it doesn’t prevent parallel changes.

To address this challenge, Terraform has a feature to enable state locking through services like AWS DynamoDB.

Locking ensures that only one user can modify the state at any given time.

How does it work?

In DynamoDB, Terraform stores a lock with a unique LockID.
When a Terraform operation (like apply or destroy) starts, it attempts to acquire the lock.
If another operation is in progress and the lock is already held, Terraform will wait (and retry) until the lock is released.
Other cloud providers have DynamoDB equivalent e.g. GCP has Cloud storage object locks. The implementation details differ but the concept is the same - Whoever gets the lock, updates the state.

Example configuration

terraform {
  backend "s3" {
    bucket         = "my-tf-state-bucket"
    key            = "env/dev/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Benefits of using state locking

Prevents race conditions and state corruption.
Enforces serialisation of infrastructure changes.

This setup is critical for production environments or any setup where multiple users or automated systems are managing infrastructure via Terraform.

Debugging Terraform

As we wrap up this overview of Terraform architecture and core concepts, it's important to know how to troubleshoot when things don’t go as planned. That’s where Terraform debugging comes in.

Whether you're dealing with unexpected plan results, authentication errors, or strange provider behavior, Terraform’s detailed structured logging helps you understand the underlying issues.

TF_LOG Levels

Terraform exposes different logging levels you can use to control the verbosity of output during command execution:

TRACE – Most detailed (everything).
DEBUG – Useful internal information (e.g., API calls, provider logic).
INFO – General progress messages.
WARN – Non-critical issues or warnings.
ERROR – Only errors that stop execution.

How to enable debug mode

You can set the log level with the TF_LOG environment variable:

TF_LOG

To avoid cluttering your terminal or to keep an on-disk record of logs, use the TF_LOG_PATH variable:

TF_LOG=DEBUG TF_LOG_PATH

What will you see in logs?

Terraform graph construction and execution.
Provider plugin handshake details.
Credential loading and authentication steps.
Cloud API requests and responses.
Backend configuration and state interactions.

End to end workflow diagram

Now, to tie everything together, here’s a high-level diagram that illustrates Terraform’s end-to-end workflow, from reading your code to updating the state file:

Fig 4. Terraform workflow diagram

This workflow is at the heart of how Terraform delivers reliable, repeatable infrastructure automation across any cloud provider.

With this full picture in mind, you're well-equipped to not only write Terraform code but also understand how and why it works behind the scenes.

We have a team of experts who can help you streamline your infrastructure operations, SRE and Platform engineering initiatives. Reach out to us here

Have you ever wondered what happens when you run Terraform commands?

In this post, we will explore the following topics to gain a better understanding of Terraform internals:

Architecture and provider plugin architecture
Credential management
State locking
Debugging

Terraform architecture

But how does Terraform core communicate with these plugins?

gRPC: The bridge between core and plugins

Terraform core and provider plugins communicate through gRPC (Google Remote Procedure call). Its usage offers multiple advantages:

Performance: gRPC is designed for low-latency, bi-directional communication. This improves the performance of Terraform plan and apply commands.
Standardization: gRPC uses protocol buffers (protobuf) to serialize data. Protobuf defines the message schemas, while gRPC handles transport, making interactions reliable and efficient.
Separation of concerns: gRPC promotes language-neutral communication between Terraform core and plugins. This means that plugins can be written in languages other than Go.
Extensibility: Providers are binaries registered via gRPC and triggered by Terraform core. Adding a new provider is as simple as creating the plugin without touching the core engine.

This design keeps Terraform flexible and future-proof, letting the community build new providers without bloating the core.

The following diagram illustrates how Terraform core delegates cloud-specific tasks to provider plugins through gRPC.

Fig 1. Terraform core delegates cloud-specific tasks to provider plugins via gRPC.

Backend interface: Support storing Terraform state and handling operations like apply and plan in a team setting.
DAG builder: Terraform core manages resource dependencies by creating a Directed Acyclic Graph (DAG). This ensures that infrastructure operations are ordered and predictable while leveraging parallelism where possible.
gPRC client and Provider plugins: External binaries that interface with cloud specific APIs.

Let us look at the DAG builder workflow in more detail to understand how Terraform leverages dependency graphs to make infrastructure operations predictable and performant.

DAG builder workflow

Parallelize independent resource operations
Detect circular dependencies
Maintain correct resource creation order

Each node in the DAG is a resource and edges define dependency relationships.

DAG visual representation

Fig 2. DAG visual representation

We need to provision aws_instance.web, which depends on the following resources:

aws_vpc.main — the VPC where the instance will run.
aws_security_group.web — the security group to be attached to the instance.
aws_iam_role.app — the IAM role associated with the instance.

The DAG flow ensures that the pre-requisite infrastructure resources are created before the AWS instance.

This process leads to efficient and deterministic infrastructure provisioning.

What happens when you run Terraform apply?

Now that we have a basic overview of how Terraform’s architecture works, let’s look at what happens when you run the Terraform apply command.

The 3 common commands that any Terraform user runs when provisioning infrastructure are:

init - initialize Terraform - downloads provider plugins and does initial housekeeping tasks.
plan - dry run - describe the operation being performed.
apply - run the operation.

We will understand the apply command in depth by using a simple aws_instance resource example:

resource "aws_instance" "web" {
  ami           = "ami-123456"
  instance_type = "t2.micro"
}

Terraform uses HCL (Hashicorp configuration Language) to write Terraform code.

The above resource creation goes through these steps:

Fig 3. Terraform resource creation

Parsing the Configuration: Terraform Core reads your HCL file. Assuming the init call is done, the provider plugins are present to communicate with the provider. After this step, Terraform parses the resource, figures out the type (aws_instance), the name (web), and all the attributes you’ve set (ami and instance_type).
Building the Dependency Graph (DAG): Terraform automatically builds a DAG of resources, inferring dependencies sometimes even ones you didn’t explicitly declare. As seen earlier, each resource becomes a node, and dependencies form the edges.
Provider Interaction: The AWS provider steps in as it is registered during the init call. It takes your resource definition and translates it into an AWS specific API call. For an EC2 instance, this means preparing a request for the AWS RunInstances API.
Making the API Call: Using your credentials (from environment variables, config files, or elsewhere - we will touch upon this topic later), the provider sends a request like this.
POST https://ec2.amazonaws.com/?Action=RunInstances { "ImageId": "ami-123456", "InstanceType": "t2.micro", ...other default parameters... }
Handling the Response Terraform handles the contract between the API call and what it shows the end user. If the call goes through successfully, Terraform parses the response metadata and extract instance properties like instance_id, public and private IPs, and more.
Updating State Whenever any resource operation occurs, Terraform’s internal state changes. This change needs to preserved for the next run. Depending on where you maintain state (local or remote), Terraform updates all the relevant attributes in the state file under aws_instance.web. This state is crucial as it lets Terraform track what’s been created, detect drift, and reference attributes in other resources.

Credential management

Terraform does not store credentials in the state file. They are passed securely at runtime using the following precedence:

Environment Variables:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
Credentials Files:
- ~/.aws/credentials
Other methods
Rather than storing permanent credentials on disk, there are more secure methods to obtain temporary credentials like AWS STS assume role.

Terraform State locking

AWS S3 has a versioning feature which versions each change. However, this is useful for auditing who made the last changes but it doesn’t prevent parallel changes.

To address this challenge, Terraform has a feature to enable state locking through services like AWS DynamoDB.

Locking ensures that only one user can modify the state at any given time.

How does it work?

In DynamoDB, Terraform stores a lock with a unique LockID.
When a Terraform operation (like apply or destroy) starts, it attempts to acquire the lock.
If another operation is in progress and the lock is already held, Terraform will wait (and retry) until the lock is released.
Other cloud providers have DynamoDB equivalent e.g. GCP has Cloud storage object locks. The implementation details differ but the concept is the same - Whoever gets the lock, updates the state.

Example configuration

terraform {
  backend "s3" {
    bucket         = "my-tf-state-bucket"
    key            = "env/dev/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Benefits of using state locking

Prevents race conditions and state corruption.
Enforces serialisation of infrastructure changes.

This setup is critical for production environments or any setup where multiple users or automated systems are managing infrastructure via Terraform.

Debugging Terraform

As we wrap up this overview of Terraform architecture and core concepts, it's important to know how to troubleshoot when things don’t go as planned. That’s where Terraform debugging comes in.

Whether you're dealing with unexpected plan results, authentication errors, or strange provider behavior, Terraform’s detailed structured logging helps you understand the underlying issues.

TF_LOG Levels

Terraform exposes different logging levels you can use to control the verbosity of output during command execution:

TRACE – Most detailed (everything).
DEBUG – Useful internal information (e.g., API calls, provider logic).
INFO – General progress messages.
WARN – Non-critical issues or warnings.
ERROR – Only errors that stop execution.

How to enable debug mode

You can set the log level with the TF_LOG environment variable:

TF_LOG

To avoid cluttering your terminal or to keep an on-disk record of logs, use the TF_LOG_PATH variable:

TF_LOG=DEBUG TF_LOG_PATH

What will you see in logs?

Terraform graph construction and execution.
Provider plugin handshake details.
Credential loading and authentication steps.
Cloud API requests and responses.
Backend configuration and state interactions.

End to end workflow diagram

Now, to tie everything together, here’s a high-level diagram that illustrates Terraform’s end-to-end workflow, from reading your code to updating the state file:

Fig 4. Terraform workflow diagram

This workflow is at the heart of how Terraform delivers reliable, repeatable infrastructure automation across any cloud provider.

November 12, 2025 | 3 min read

Blog

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Subscribe for more such content

Get the latest in software engineering best practices straight to your inbox. Subscribe now!

Services

Resources

Company

What happens when you run Terraform commands?

What happens when you run Terraform commands?

What happens when you run Terraform commands?

What happens when you run Terraform commands?

What happens when you run Terraform commands?

Share

Jump to section

Continue reading.

How Queueing Theory Makes Systems Reliable

Ever wondered why some systems crumble at high load while others handle spikes with ease? This guide breaks down queueing theory in plain English, showing reliability engineers how to spot danger zones, manage capacity, and avoid late-night incidents.

How Queueing Theory Makes Systems Reliable

Ever wondered why some systems crumble at high load while others handle spikes with ease? This guide breaks down queueing theory in plain English, showing reliability engineers how to spot danger zones, manage capacity, and avoid late-night incidents.

Error Budget Calculation: Downtime Minutes for every SLO

Error Budget Calculation: Downtime Minutes for every SLO

How to read SRE graphs without lying to yourself

Are your SRE charts messing with your head? We’ll show you step by step how to actually make sense of those dashboards: percentiles, averages, heatmaps, and all, so you spot real issues fast. No jargon, just practical advice from daily SRE work.

How to read SRE graphs without lying to yourself

Are your SRE charts messing with your head? We’ll show you step by step how to actually make sense of those dashboards: percentiles, averages, heatmaps, and all, so you spot real issues fast. No jargon, just practical advice from daily SRE work.

Percentiles in SRE: Why averages lie about latency

Ever wondered why your apps seem slow even when the average latency looks fine? SREs use percentiles to uncover the real story behind performance making sure users get the speed they expect. Learn how One2N’s experts measure what truly matters, spot hidden issues, and keep systems reliable.

Percentiles in SRE: Why averages lie about latency

Ever wondered why your apps seem slow even when the average latency looks fine? SREs use percentiles to uncover the real story behind performance making sure users get the speed they expect. Learn how One2N’s experts measure what truly matters, spot hidden issues, and keep systems reliable.

Deploying a scalable NATS cluster part 2: hands-on demo

Deploying a scalable NATS cluster part 2: hands-on demo

Comparing Latency vs Throughput: why high utilisation hurts reliability

Ever wondered why your systems slow down or fail during peak times? This guide explains in plain English how latency and throughput affect reliability, and why running too close to max capacity leads to problems

Comparing Latency vs Throughput: why high utilisation hurts reliability

Ever wondered why your systems slow down or fail during peak times? This guide explains in plain English how latency and throughput affect reliability, and why running too close to max capacity leads to problems

How Queueing Theory Makes Systems Reliable

Ever wondered why some systems crumble at high load while others handle spikes with ease? This guide breaks down queueing theory in plain English, showing reliability engineers how to spot danger zones, manage capacity, and avoid late-night incidents.

Error Budget Calculation: Downtime Minutes for every SLO

How to read SRE graphs without lying to yourself

Are your SRE charts messing with your head? We’ll show you step by step how to actually make sense of those dashboards: percentiles, averages, heatmaps, and all, so you spot real issues fast. No jargon, just practical advice from daily SRE work.

Percentiles in SRE: Why averages lie about latency

Ever wondered why your apps seem slow even when the average latency looks fine? SREs use percentiles to uncover the real story behind performance making sure users get the speed they expect. Learn how One2N’s experts measure what truly matters, spot hidden issues, and keep systems reliable.

How Queueing Theory Makes Systems Reliable

Ever wondered why some systems crumble at high load while others handle spikes with ease? This guide breaks down queueing theory in plain English, showing reliability engineers how to spot danger zones, manage capacity, and avoid late-night incidents.

Error Budget Calculation: Downtime Minutes for every SLO

How to read SRE graphs without lying to yourself

Are your SRE charts messing with your head? We’ll show you step by step how to actually make sense of those dashboards: percentiles, averages, heatmaps, and all, so you spot real issues fast. No jargon, just practical advice from daily SRE work.

Percentiles in SRE: Why averages lie about latency

Ever wondered why your apps seem slow even when the average latency looks fine? SREs use percentiles to uncover the real story behind performance making sure users get the speed they expect. Learn how One2N’s experts measure what truly matters, spot hidden issues, and keep systems reliable.

Subscribe for more such content

Subscribe for more such content

Subscribe for more such content

Subscribe for more such content