Have you ever wondered what happens when you run Terraform commands?
In this post, we will explore the following topics to gain a better understanding of Terraform internals:
Architecture and provider plugin architecture
Credential management
State locking
Debugging
Terraform architecture
Terraform uses a modular plugin-based architecture. This makes its core engine (written in Go) lightweight, while Terraform plugins do the heavy lifting of communicating with cloud provider APIs. These plugins are maintained by the respective cloud vendors e.g. AWS, GCP, etc.
But how does Terraform core communicate with these plugins?
gRPC: The bridge between core and plugins
Terraform core and provider plugins communicate through gRPC (Google Remote Procedure call). Its usage offers multiple advantages:
Performance: gRPC is designed for low-latency, bi-directional communication. This improves the performance of Terraform plan and apply commands.
Standardization: gRPC uses protocol buffers (protobuf) to serialize data. Protobuf defines the message schemas, while gRPC handles transport, making interactions reliable and efficient.
Separation of concerns: gRPC promotes language-neutral communication between Terraform core and plugins. This means that plugins can be written in languages other than Go.
Extensibility: Providers are binaries registered via gRPC and triggered by Terraform core. Adding a new provider is as simple as creating the plugin without touching the core engine.
This design keeps Terraform flexible and future-proof, letting the community build new providers without bloating the core.
The simplicity and modularity of Terraform plugin architecture is evident in cases where organizations have built Terraform providers beyond cloud infra use cases e.g. Datadog provider for setting alerts, Grafana provider managing Grafana operations, etc. Interestingly, some contributors have pushed the limits and even wrote providers to order Pizza. Check it out here.
The following diagram illustrates how Terraform core delegates cloud-specific tasks to provider plugins through gRPC.

Fig 1. Terraform core delegates cloud-specific tasks to provider plugins via gRPC.
Backend interface: Support storing Terraform state and handling operations like
apply
andplan
in a team setting.DAG builder: Terraform core manages resource dependencies by creating a Directed Acyclic Graph (DAG). This ensures that infrastructure operations are ordered and predictable while leveraging parallelism where possible.
gPRC client and Provider plugins: External binaries that interface with cloud specific APIs.
Let us look at the DAG builder workflow in more detail to understand how Terraform leverages dependency graphs to make infrastructure operations predictable and performant.
DAG builder workflow
Terraform core constructs a DAG to represent the relationships and dependencies between resources. This ensures resources are created, modified, or destroyed in the correct order. For example, a security group must be created before an EC2 instance that uses it. The DAG allows Terraform to:
Parallelize independent resource operations
Detect circular dependencies
Maintain correct resource creation order
Each node in the DAG is a resource and edges define dependency relationships.
DAG visual representation

Fig 2. DAG visual representation
We need to provision aws_instance.web
, which depends on the following resources:
aws_vpc.main
— the VPC where the instance will run.aws_security_group.web
— the security group to be attached to the instance.aws_iam_role.app
— the IAM role associated with the instance.
The DAG flow ensures that the pre-requisite infrastructure resources are created before the AWS instance.
This process leads to efficient and deterministic infrastructure provisioning.
What happens when you run Terraform apply?
Now that we have a basic overview of how Terraform’s architecture works, let’s look at what happens when you run the Terraform apply
command.
The 3 common commands that any Terraform user runs when provisioning infrastructure are:
init
- initialize Terraform - downloads provider plugins and does initial housekeeping tasks.plan
- dry run - describe the operation being performed.apply
- run the operation.
We will understand the apply
command in depth by using a simple aws_instance
resource example:
resource "aws_instance" "web" { ami = "ami-123456" instance_type = "t2.micro" }
Terraform uses HCL (Hashicorp configuration Language) to write Terraform code.
The above resource creation goes through these steps:

Fig 3. Terraform resource creation
Parsing the Configuration: Terraform Core reads your HCL file. Assuming the
init
call is done, the provider plugins are present to communicate with the provider. After this step, Terraform parses the resource, figures out the type (aws_instance
), the name (web
), and all the attributes you’ve set (ami
andinstance_type
).Building the Dependency Graph (DAG): Terraform automatically builds a DAG of resources, inferring dependencies sometimes even ones you didn’t explicitly declare. As seen earlier, each resource becomes a node, and dependencies form the edges.
Provider Interaction: The AWS provider steps in as it is registered during the
init
call. It takes your resource definition and translates it into an AWS specific API call. For an EC2 instance, this means preparing a request for the AWSRunInstances
API.Making the API Call: Using your credentials (from environment variables, config files, or elsewhere - we will touch upon this topic later), the provider sends a request like this.
POST https://ec2.amazonaws.com/?Action=RunInstances { "ImageId": "ami-123456", "InstanceType": "t2.micro", ...other default parameters... }
Handling the Response Terraform handles the contract between the API call and what it shows the end user. If the call goes through successfully, Terraform parses the response metadata and extract instance properties like instance_id, public and private IPs, and more.
Updating State Whenever any resource operation occurs, Terraform’s internal state changes. This change needs to preserved for the next run. Depending on where you maintain state (local or remote), Terraform updates all the relevant attributes in the state file under
aws_instance.web
. This state is crucial as it lets Terraform track what’s been created, detect drift, and reference attributes in other resources.
We assumed that Terraform can talk to the cloud provider through credentials stored somewhere. Let us take the example of AWS and see the different ways in which Terraform credentials are managed. Similar setups exist for other cloud providers.
Credential management
Terraform does not store credentials in the state file. They are passed securely at runtime using the following precedence:
Environment Variables:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
Credentials Files:
~/.aws/credentials
Other methods
Rather than storing permanent credentials on disk, there are more secure methods to obtain temporary credentials like AWS STS assume role.
Terraform State locking
Terraform can maintain state locally or remotely. Local state involves writing state to an on-disk file. This is not recommended as it keeps central cloud changes on someone’s laptop. This can be prevented by using a central remote storage like s3.
Using a central storage brings up another problem - What if multiple users of the same team update the state in parallel? This could corrupt the state leading to the last update overwriting any previous versions.
AWS S3 has a versioning feature which versions each change. However, this is useful for auditing who made the last changes but it doesn’t prevent parallel changes.
To address this challenge, Terraform has a feature to enable state locking through services like AWS DynamoDB.
Locking ensures that only one user can modify the state at any given time.
How does it work?
In DynamoDB, Terraform stores a lock with a unique
LockID
.When a Terraform operation (like
apply
ordestroy
) starts, it attempts to acquire the lock.If another operation is in progress and the lock is already held, Terraform will wait (and retry) until the lock is released.
Other cloud providers have DynamoDB equivalent e.g. GCP has Cloud storage object locks. The implementation details differ but the concept is the same - Whoever gets the lock, updates the state.
Example configuration
terraform { backend "s3" { bucket = "my-tf-state-bucket" key = "env/dev/terraform.tfstate" region = "us-east-1" dynamodb_table = "terraform-locks" encrypt = true } }
Benefits of using state locking
Prevents race conditions and state corruption.
Enforces serialisation of infrastructure changes.
This setup is critical for production environments or any setup where multiple users or automated systems are managing infrastructure via Terraform.
Debugging Terraform
As we wrap up this overview of Terraform architecture and core concepts, it's important to know how to troubleshoot when things don’t go as planned. That’s where Terraform debugging comes in.
Whether you're dealing with unexpected plan results, authentication errors, or strange provider behavior, Terraform’s detailed structured logging helps you understand the underlying issues.
TF_LOG Levels
Terraform exposes different logging levels you can use to control the verbosity of output during command execution:
TRACE
– Most detailed (everything).DEBUG
– Useful internal information (e.g., API calls, provider logic).INFO
– General progress messages.WARN
– Non-critical issues or warnings.ERROR
– Only errors that stop execution.
How to enable debug mode
You can set the log level with the TF_LOG
environment variable:
TF_LOG
To avoid cluttering your terminal or to keep an on-disk record of logs, use the TF_LOG_PATH
variable:
TF_LOG=DEBUG TF_LOG_PATH
What will you see in logs?
Terraform graph construction and execution.
Provider plugin handshake details.
Credential loading and authentication steps.
Cloud API requests and responses.
Backend configuration and state interactions.
End to end workflow diagram
Throughout this guide, we’ve explored how Terraform works under the hood - starting with how it parses .tf
configuration files, builds a dependency graph (DAG), interacts with provider plugins via gRPC, and communicates with cloud APIs to provision infrastructure.
Now, to tie everything together, here’s a high-level diagram that illustrates Terraform’s end-to-end workflow, from reading your code to updating the state file:

Fig 4. Terraform workflow diagram
This workflow is at the heart of how Terraform delivers reliable, repeatable infrastructure automation across any cloud provider.
With this full picture in mind, you're well-equipped to not only write Terraform code but also understand how and why it works behind the scenes.
We have a team of experts who can help you streamline your infrastructure operations, SRE and Platform engineering initiatives. Reach out to us here
Have you ever wondered what happens when you run Terraform commands?
In this post, we will explore the following topics to gain a better understanding of Terraform internals:
Architecture and provider plugin architecture
Credential management
State locking
Debugging
Terraform architecture
Terraform uses a modular plugin-based architecture. This makes its core engine (written in Go) lightweight, while Terraform plugins do the heavy lifting of communicating with cloud provider APIs. These plugins are maintained by the respective cloud vendors e.g. AWS, GCP, etc.
But how does Terraform core communicate with these plugins?
gRPC: The bridge between core and plugins
Terraform core and provider plugins communicate through gRPC (Google Remote Procedure call). Its usage offers multiple advantages:
Performance: gRPC is designed for low-latency, bi-directional communication. This improves the performance of Terraform plan and apply commands.
Standardization: gRPC uses protocol buffers (protobuf) to serialize data. Protobuf defines the message schemas, while gRPC handles transport, making interactions reliable and efficient.
Separation of concerns: gRPC promotes language-neutral communication between Terraform core and plugins. This means that plugins can be written in languages other than Go.
Extensibility: Providers are binaries registered via gRPC and triggered by Terraform core. Adding a new provider is as simple as creating the plugin without touching the core engine.
This design keeps Terraform flexible and future-proof, letting the community build new providers without bloating the core.
The simplicity and modularity of Terraform plugin architecture is evident in cases where organizations have built Terraform providers beyond cloud infra use cases e.g. Datadog provider for setting alerts, Grafana provider managing Grafana operations, etc. Interestingly, some contributors have pushed the limits and even wrote providers to order Pizza. Check it out here.
The following diagram illustrates how Terraform core delegates cloud-specific tasks to provider plugins through gRPC.

Fig 1. Terraform core delegates cloud-specific tasks to provider plugins via gRPC.
Backend interface: Support storing Terraform state and handling operations like
apply
andplan
in a team setting.DAG builder: Terraform core manages resource dependencies by creating a Directed Acyclic Graph (DAG). This ensures that infrastructure operations are ordered and predictable while leveraging parallelism where possible.
gPRC client and Provider plugins: External binaries that interface with cloud specific APIs.
Let us look at the DAG builder workflow in more detail to understand how Terraform leverages dependency graphs to make infrastructure operations predictable and performant.
DAG builder workflow
Terraform core constructs a DAG to represent the relationships and dependencies between resources. This ensures resources are created, modified, or destroyed in the correct order. For example, a security group must be created before an EC2 instance that uses it. The DAG allows Terraform to:
Parallelize independent resource operations
Detect circular dependencies
Maintain correct resource creation order
Each node in the DAG is a resource and edges define dependency relationships.
DAG visual representation

Fig 2. DAG visual representation
We need to provision aws_instance.web
, which depends on the following resources:
aws_vpc.main
— the VPC where the instance will run.aws_security_group.web
— the security group to be attached to the instance.aws_iam_role.app
— the IAM role associated with the instance.
The DAG flow ensures that the pre-requisite infrastructure resources are created before the AWS instance.
This process leads to efficient and deterministic infrastructure provisioning.
What happens when you run Terraform apply?
Now that we have a basic overview of how Terraform’s architecture works, let’s look at what happens when you run the Terraform apply
command.
The 3 common commands that any Terraform user runs when provisioning infrastructure are:
init
- initialize Terraform - downloads provider plugins and does initial housekeeping tasks.plan
- dry run - describe the operation being performed.apply
- run the operation.
We will understand the apply
command in depth by using a simple aws_instance
resource example:
resource "aws_instance" "web" { ami = "ami-123456" instance_type = "t2.micro" }
Terraform uses HCL (Hashicorp configuration Language) to write Terraform code.
The above resource creation goes through these steps:

Fig 3. Terraform resource creation
Parsing the Configuration: Terraform Core reads your HCL file. Assuming the
init
call is done, the provider plugins are present to communicate with the provider. After this step, Terraform parses the resource, figures out the type (aws_instance
), the name (web
), and all the attributes you’ve set (ami
andinstance_type
).Building the Dependency Graph (DAG): Terraform automatically builds a DAG of resources, inferring dependencies sometimes even ones you didn’t explicitly declare. As seen earlier, each resource becomes a node, and dependencies form the edges.
Provider Interaction: The AWS provider steps in as it is registered during the
init
call. It takes your resource definition and translates it into an AWS specific API call. For an EC2 instance, this means preparing a request for the AWSRunInstances
API.Making the API Call: Using your credentials (from environment variables, config files, or elsewhere - we will touch upon this topic later), the provider sends a request like this.
POST https://ec2.amazonaws.com/?Action=RunInstances { "ImageId": "ami-123456", "InstanceType": "t2.micro", ...other default parameters... }
Handling the Response Terraform handles the contract between the API call and what it shows the end user. If the call goes through successfully, Terraform parses the response metadata and extract instance properties like instance_id, public and private IPs, and more.
Updating State Whenever any resource operation occurs, Terraform’s internal state changes. This change needs to preserved for the next run. Depending on where you maintain state (local or remote), Terraform updates all the relevant attributes in the state file under
aws_instance.web
. This state is crucial as it lets Terraform track what’s been created, detect drift, and reference attributes in other resources.
We assumed that Terraform can talk to the cloud provider through credentials stored somewhere. Let us take the example of AWS and see the different ways in which Terraform credentials are managed. Similar setups exist for other cloud providers.
Credential management
Terraform does not store credentials in the state file. They are passed securely at runtime using the following precedence:
Environment Variables:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
Credentials Files:
~/.aws/credentials
Other methods
Rather than storing permanent credentials on disk, there are more secure methods to obtain temporary credentials like AWS STS assume role.
Terraform State locking
Terraform can maintain state locally or remotely. Local state involves writing state to an on-disk file. This is not recommended as it keeps central cloud changes on someone’s laptop. This can be prevented by using a central remote storage like s3.
Using a central storage brings up another problem - What if multiple users of the same team update the state in parallel? This could corrupt the state leading to the last update overwriting any previous versions.
AWS S3 has a versioning feature which versions each change. However, this is useful for auditing who made the last changes but it doesn’t prevent parallel changes.
To address this challenge, Terraform has a feature to enable state locking through services like AWS DynamoDB.
Locking ensures that only one user can modify the state at any given time.
How does it work?
In DynamoDB, Terraform stores a lock with a unique
LockID
.When a Terraform operation (like
apply
ordestroy
) starts, it attempts to acquire the lock.If another operation is in progress and the lock is already held, Terraform will wait (and retry) until the lock is released.
Other cloud providers have DynamoDB equivalent e.g. GCP has Cloud storage object locks. The implementation details differ but the concept is the same - Whoever gets the lock, updates the state.
Example configuration
terraform { backend "s3" { bucket = "my-tf-state-bucket" key = "env/dev/terraform.tfstate" region = "us-east-1" dynamodb_table = "terraform-locks" encrypt = true } }
Benefits of using state locking
Prevents race conditions and state corruption.
Enforces serialisation of infrastructure changes.
This setup is critical for production environments or any setup where multiple users or automated systems are managing infrastructure via Terraform.
Debugging Terraform
As we wrap up this overview of Terraform architecture and core concepts, it's important to know how to troubleshoot when things don’t go as planned. That’s where Terraform debugging comes in.
Whether you're dealing with unexpected plan results, authentication errors, or strange provider behavior, Terraform’s detailed structured logging helps you understand the underlying issues.
TF_LOG Levels
Terraform exposes different logging levels you can use to control the verbosity of output during command execution:
TRACE
– Most detailed (everything).DEBUG
– Useful internal information (e.g., API calls, provider logic).INFO
– General progress messages.WARN
– Non-critical issues or warnings.ERROR
– Only errors that stop execution.
How to enable debug mode
You can set the log level with the TF_LOG
environment variable:
TF_LOG
To avoid cluttering your terminal or to keep an on-disk record of logs, use the TF_LOG_PATH
variable:
TF_LOG=DEBUG TF_LOG_PATH
What will you see in logs?
Terraform graph construction and execution.
Provider plugin handshake details.
Credential loading and authentication steps.
Cloud API requests and responses.
Backend configuration and state interactions.
End to end workflow diagram
Throughout this guide, we’ve explored how Terraform works under the hood - starting with how it parses .tf
configuration files, builds a dependency graph (DAG), interacts with provider plugins via gRPC, and communicates with cloud APIs to provision infrastructure.
Now, to tie everything together, here’s a high-level diagram that illustrates Terraform’s end-to-end workflow, from reading your code to updating the state file:

Fig 4. Terraform workflow diagram
This workflow is at the heart of how Terraform delivers reliable, repeatable infrastructure automation across any cloud provider.
With this full picture in mind, you're well-equipped to not only write Terraform code but also understand how and why it works behind the scenes.
We have a team of experts who can help you streamline your infrastructure operations, SRE and Platform engineering initiatives. Reach out to us here
Have you ever wondered what happens when you run Terraform commands?
In this post, we will explore the following topics to gain a better understanding of Terraform internals:
Architecture and provider plugin architecture
Credential management
State locking
Debugging
Terraform architecture
Terraform uses a modular plugin-based architecture. This makes its core engine (written in Go) lightweight, while Terraform plugins do the heavy lifting of communicating with cloud provider APIs. These plugins are maintained by the respective cloud vendors e.g. AWS, GCP, etc.
But how does Terraform core communicate with these plugins?
gRPC: The bridge between core and plugins
Terraform core and provider plugins communicate through gRPC (Google Remote Procedure call). Its usage offers multiple advantages:
Performance: gRPC is designed for low-latency, bi-directional communication. This improves the performance of Terraform plan and apply commands.
Standardization: gRPC uses protocol buffers (protobuf) to serialize data. Protobuf defines the message schemas, while gRPC handles transport, making interactions reliable and efficient.
Separation of concerns: gRPC promotes language-neutral communication between Terraform core and plugins. This means that plugins can be written in languages other than Go.
Extensibility: Providers are binaries registered via gRPC and triggered by Terraform core. Adding a new provider is as simple as creating the plugin without touching the core engine.
This design keeps Terraform flexible and future-proof, letting the community build new providers without bloating the core.
The simplicity and modularity of Terraform plugin architecture is evident in cases where organizations have built Terraform providers beyond cloud infra use cases e.g. Datadog provider for setting alerts, Grafana provider managing Grafana operations, etc. Interestingly, some contributors have pushed the limits and even wrote providers to order Pizza. Check it out here.
The following diagram illustrates how Terraform core delegates cloud-specific tasks to provider plugins through gRPC.

Fig 1. Terraform core delegates cloud-specific tasks to provider plugins via gRPC.
Backend interface: Support storing Terraform state and handling operations like
apply
andplan
in a team setting.DAG builder: Terraform core manages resource dependencies by creating a Directed Acyclic Graph (DAG). This ensures that infrastructure operations are ordered and predictable while leveraging parallelism where possible.
gPRC client and Provider plugins: External binaries that interface with cloud specific APIs.
Let us look at the DAG builder workflow in more detail to understand how Terraform leverages dependency graphs to make infrastructure operations predictable and performant.
DAG builder workflow
Terraform core constructs a DAG to represent the relationships and dependencies between resources. This ensures resources are created, modified, or destroyed in the correct order. For example, a security group must be created before an EC2 instance that uses it. The DAG allows Terraform to:
Parallelize independent resource operations
Detect circular dependencies
Maintain correct resource creation order
Each node in the DAG is a resource and edges define dependency relationships.
DAG visual representation

Fig 2. DAG visual representation
We need to provision aws_instance.web
, which depends on the following resources:
aws_vpc.main
— the VPC where the instance will run.aws_security_group.web
— the security group to be attached to the instance.aws_iam_role.app
— the IAM role associated with the instance.
The DAG flow ensures that the pre-requisite infrastructure resources are created before the AWS instance.
This process leads to efficient and deterministic infrastructure provisioning.
What happens when you run Terraform apply?
Now that we have a basic overview of how Terraform’s architecture works, let’s look at what happens when you run the Terraform apply
command.
The 3 common commands that any Terraform user runs when provisioning infrastructure are:
init
- initialize Terraform - downloads provider plugins and does initial housekeeping tasks.plan
- dry run - describe the operation being performed.apply
- run the operation.
We will understand the apply
command in depth by using a simple aws_instance
resource example:
resource "aws_instance" "web" { ami = "ami-123456" instance_type = "t2.micro" }
Terraform uses HCL (Hashicorp configuration Language) to write Terraform code.
The above resource creation goes through these steps:

Fig 3. Terraform resource creation
Parsing the Configuration: Terraform Core reads your HCL file. Assuming the
init
call is done, the provider plugins are present to communicate with the provider. After this step, Terraform parses the resource, figures out the type (aws_instance
), the name (web
), and all the attributes you’ve set (ami
andinstance_type
).Building the Dependency Graph (DAG): Terraform automatically builds a DAG of resources, inferring dependencies sometimes even ones you didn’t explicitly declare. As seen earlier, each resource becomes a node, and dependencies form the edges.
Provider Interaction: The AWS provider steps in as it is registered during the
init
call. It takes your resource definition and translates it into an AWS specific API call. For an EC2 instance, this means preparing a request for the AWSRunInstances
API.Making the API Call: Using your credentials (from environment variables, config files, or elsewhere - we will touch upon this topic later), the provider sends a request like this.
POST https://ec2.amazonaws.com/?Action=RunInstances { "ImageId": "ami-123456", "InstanceType": "t2.micro", ...other default parameters... }
Handling the Response Terraform handles the contract between the API call and what it shows the end user. If the call goes through successfully, Terraform parses the response metadata and extract instance properties like instance_id, public and private IPs, and more.
Updating State Whenever any resource operation occurs, Terraform’s internal state changes. This change needs to preserved for the next run. Depending on where you maintain state (local or remote), Terraform updates all the relevant attributes in the state file under
aws_instance.web
. This state is crucial as it lets Terraform track what’s been created, detect drift, and reference attributes in other resources.
We assumed that Terraform can talk to the cloud provider through credentials stored somewhere. Let us take the example of AWS and see the different ways in which Terraform credentials are managed. Similar setups exist for other cloud providers.
Credential management
Terraform does not store credentials in the state file. They are passed securely at runtime using the following precedence:
Environment Variables:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
Credentials Files:
~/.aws/credentials
Other methods
Rather than storing permanent credentials on disk, there are more secure methods to obtain temporary credentials like AWS STS assume role.
Terraform State locking
Terraform can maintain state locally or remotely. Local state involves writing state to an on-disk file. This is not recommended as it keeps central cloud changes on someone’s laptop. This can be prevented by using a central remote storage like s3.
Using a central storage brings up another problem - What if multiple users of the same team update the state in parallel? This could corrupt the state leading to the last update overwriting any previous versions.
AWS S3 has a versioning feature which versions each change. However, this is useful for auditing who made the last changes but it doesn’t prevent parallel changes.
To address this challenge, Terraform has a feature to enable state locking through services like AWS DynamoDB.
Locking ensures that only one user can modify the state at any given time.
How does it work?
In DynamoDB, Terraform stores a lock with a unique
LockID
.When a Terraform operation (like
apply
ordestroy
) starts, it attempts to acquire the lock.If another operation is in progress and the lock is already held, Terraform will wait (and retry) until the lock is released.
Other cloud providers have DynamoDB equivalent e.g. GCP has Cloud storage object locks. The implementation details differ but the concept is the same - Whoever gets the lock, updates the state.
Example configuration
terraform { backend "s3" { bucket = "my-tf-state-bucket" key = "env/dev/terraform.tfstate" region = "us-east-1" dynamodb_table = "terraform-locks" encrypt = true } }
Benefits of using state locking
Prevents race conditions and state corruption.
Enforces serialisation of infrastructure changes.
This setup is critical for production environments or any setup where multiple users or automated systems are managing infrastructure via Terraform.
Debugging Terraform
As we wrap up this overview of Terraform architecture and core concepts, it's important to know how to troubleshoot when things don’t go as planned. That’s where Terraform debugging comes in.
Whether you're dealing with unexpected plan results, authentication errors, or strange provider behavior, Terraform’s detailed structured logging helps you understand the underlying issues.
TF_LOG Levels
Terraform exposes different logging levels you can use to control the verbosity of output during command execution:
TRACE
– Most detailed (everything).DEBUG
– Useful internal information (e.g., API calls, provider logic).INFO
– General progress messages.WARN
– Non-critical issues or warnings.ERROR
– Only errors that stop execution.
How to enable debug mode
You can set the log level with the TF_LOG
environment variable:
TF_LOG
To avoid cluttering your terminal or to keep an on-disk record of logs, use the TF_LOG_PATH
variable:
TF_LOG=DEBUG TF_LOG_PATH
What will you see in logs?
Terraform graph construction and execution.
Provider plugin handshake details.
Credential loading and authentication steps.
Cloud API requests and responses.
Backend configuration and state interactions.
End to end workflow diagram
Throughout this guide, we’ve explored how Terraform works under the hood - starting with how it parses .tf
configuration files, builds a dependency graph (DAG), interacts with provider plugins via gRPC, and communicates with cloud APIs to provision infrastructure.
Now, to tie everything together, here’s a high-level diagram that illustrates Terraform’s end-to-end workflow, from reading your code to updating the state file:

Fig 4. Terraform workflow diagram
This workflow is at the heart of how Terraform delivers reliable, repeatable infrastructure automation across any cloud provider.
With this full picture in mind, you're well-equipped to not only write Terraform code but also understand how and why it works behind the scenes.
We have a team of experts who can help you streamline your infrastructure operations, SRE and Platform engineering initiatives. Reach out to us here
Have you ever wondered what happens when you run Terraform commands?
In this post, we will explore the following topics to gain a better understanding of Terraform internals:
Architecture and provider plugin architecture
Credential management
State locking
Debugging
Terraform architecture
Terraform uses a modular plugin-based architecture. This makes its core engine (written in Go) lightweight, while Terraform plugins do the heavy lifting of communicating with cloud provider APIs. These plugins are maintained by the respective cloud vendors e.g. AWS, GCP, etc.
But how does Terraform core communicate with these plugins?
gRPC: The bridge between core and plugins
Terraform core and provider plugins communicate through gRPC (Google Remote Procedure call). Its usage offers multiple advantages:
Performance: gRPC is designed for low-latency, bi-directional communication. This improves the performance of Terraform plan and apply commands.
Standardization: gRPC uses protocol buffers (protobuf) to serialize data. Protobuf defines the message schemas, while gRPC handles transport, making interactions reliable and efficient.
Separation of concerns: gRPC promotes language-neutral communication between Terraform core and plugins. This means that plugins can be written in languages other than Go.
Extensibility: Providers are binaries registered via gRPC and triggered by Terraform core. Adding a new provider is as simple as creating the plugin without touching the core engine.
This design keeps Terraform flexible and future-proof, letting the community build new providers without bloating the core.
The simplicity and modularity of Terraform plugin architecture is evident in cases where organizations have built Terraform providers beyond cloud infra use cases e.g. Datadog provider for setting alerts, Grafana provider managing Grafana operations, etc. Interestingly, some contributors have pushed the limits and even wrote providers to order Pizza. Check it out here.
The following diagram illustrates how Terraform core delegates cloud-specific tasks to provider plugins through gRPC.

Fig 1. Terraform core delegates cloud-specific tasks to provider plugins via gRPC.
Backend interface: Support storing Terraform state and handling operations like
apply
andplan
in a team setting.DAG builder: Terraform core manages resource dependencies by creating a Directed Acyclic Graph (DAG). This ensures that infrastructure operations are ordered and predictable while leveraging parallelism where possible.
gPRC client and Provider plugins: External binaries that interface with cloud specific APIs.
Let us look at the DAG builder workflow in more detail to understand how Terraform leverages dependency graphs to make infrastructure operations predictable and performant.
DAG builder workflow
Terraform core constructs a DAG to represent the relationships and dependencies between resources. This ensures resources are created, modified, or destroyed in the correct order. For example, a security group must be created before an EC2 instance that uses it. The DAG allows Terraform to:
Parallelize independent resource operations
Detect circular dependencies
Maintain correct resource creation order
Each node in the DAG is a resource and edges define dependency relationships.
DAG visual representation

Fig 2. DAG visual representation
We need to provision aws_instance.web
, which depends on the following resources:
aws_vpc.main
— the VPC where the instance will run.aws_security_group.web
— the security group to be attached to the instance.aws_iam_role.app
— the IAM role associated with the instance.
The DAG flow ensures that the pre-requisite infrastructure resources are created before the AWS instance.
This process leads to efficient and deterministic infrastructure provisioning.
What happens when you run Terraform apply?
Now that we have a basic overview of how Terraform’s architecture works, let’s look at what happens when you run the Terraform apply
command.
The 3 common commands that any Terraform user runs when provisioning infrastructure are:
init
- initialize Terraform - downloads provider plugins and does initial housekeeping tasks.plan
- dry run - describe the operation being performed.apply
- run the operation.
We will understand the apply
command in depth by using a simple aws_instance
resource example:
resource "aws_instance" "web" { ami = "ami-123456" instance_type = "t2.micro" }
Terraform uses HCL (Hashicorp configuration Language) to write Terraform code.
The above resource creation goes through these steps:

Fig 3. Terraform resource creation
Parsing the Configuration: Terraform Core reads your HCL file. Assuming the
init
call is done, the provider plugins are present to communicate with the provider. After this step, Terraform parses the resource, figures out the type (aws_instance
), the name (web
), and all the attributes you’ve set (ami
andinstance_type
).Building the Dependency Graph (DAG): Terraform automatically builds a DAG of resources, inferring dependencies sometimes even ones you didn’t explicitly declare. As seen earlier, each resource becomes a node, and dependencies form the edges.
Provider Interaction: The AWS provider steps in as it is registered during the
init
call. It takes your resource definition and translates it into an AWS specific API call. For an EC2 instance, this means preparing a request for the AWSRunInstances
API.Making the API Call: Using your credentials (from environment variables, config files, or elsewhere - we will touch upon this topic later), the provider sends a request like this.
POST https://ec2.amazonaws.com/?Action=RunInstances { "ImageId": "ami-123456", "InstanceType": "t2.micro", ...other default parameters... }
Handling the Response Terraform handles the contract between the API call and what it shows the end user. If the call goes through successfully, Terraform parses the response metadata and extract instance properties like instance_id, public and private IPs, and more.
Updating State Whenever any resource operation occurs, Terraform’s internal state changes. This change needs to preserved for the next run. Depending on where you maintain state (local or remote), Terraform updates all the relevant attributes in the state file under
aws_instance.web
. This state is crucial as it lets Terraform track what’s been created, detect drift, and reference attributes in other resources.
We assumed that Terraform can talk to the cloud provider through credentials stored somewhere. Let us take the example of AWS and see the different ways in which Terraform credentials are managed. Similar setups exist for other cloud providers.
Credential management
Terraform does not store credentials in the state file. They are passed securely at runtime using the following precedence:
Environment Variables:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
Credentials Files:
~/.aws/credentials
Other methods
Rather than storing permanent credentials on disk, there are more secure methods to obtain temporary credentials like AWS STS assume role.
Terraform State locking
Terraform can maintain state locally or remotely. Local state involves writing state to an on-disk file. This is not recommended as it keeps central cloud changes on someone’s laptop. This can be prevented by using a central remote storage like s3.
Using a central storage brings up another problem - What if multiple users of the same team update the state in parallel? This could corrupt the state leading to the last update overwriting any previous versions.
AWS S3 has a versioning feature which versions each change. However, this is useful for auditing who made the last changes but it doesn’t prevent parallel changes.
To address this challenge, Terraform has a feature to enable state locking through services like AWS DynamoDB.
Locking ensures that only one user can modify the state at any given time.
How does it work?
In DynamoDB, Terraform stores a lock with a unique
LockID
.When a Terraform operation (like
apply
ordestroy
) starts, it attempts to acquire the lock.If another operation is in progress and the lock is already held, Terraform will wait (and retry) until the lock is released.
Other cloud providers have DynamoDB equivalent e.g. GCP has Cloud storage object locks. The implementation details differ but the concept is the same - Whoever gets the lock, updates the state.
Example configuration
terraform { backend "s3" { bucket = "my-tf-state-bucket" key = "env/dev/terraform.tfstate" region = "us-east-1" dynamodb_table = "terraform-locks" encrypt = true } }
Benefits of using state locking
Prevents race conditions and state corruption.
Enforces serialisation of infrastructure changes.
This setup is critical for production environments or any setup where multiple users or automated systems are managing infrastructure via Terraform.
Debugging Terraform
As we wrap up this overview of Terraform architecture and core concepts, it's important to know how to troubleshoot when things don’t go as planned. That’s where Terraform debugging comes in.
Whether you're dealing with unexpected plan results, authentication errors, or strange provider behavior, Terraform’s detailed structured logging helps you understand the underlying issues.
TF_LOG Levels
Terraform exposes different logging levels you can use to control the verbosity of output during command execution:
TRACE
– Most detailed (everything).DEBUG
– Useful internal information (e.g., API calls, provider logic).INFO
– General progress messages.WARN
– Non-critical issues or warnings.ERROR
– Only errors that stop execution.
How to enable debug mode
You can set the log level with the TF_LOG
environment variable:
TF_LOG
To avoid cluttering your terminal or to keep an on-disk record of logs, use the TF_LOG_PATH
variable:
TF_LOG=DEBUG TF_LOG_PATH
What will you see in logs?
Terraform graph construction and execution.
Provider plugin handshake details.
Credential loading and authentication steps.
Cloud API requests and responses.
Backend configuration and state interactions.
End to end workflow diagram
Throughout this guide, we’ve explored how Terraform works under the hood - starting with how it parses .tf
configuration files, builds a dependency graph (DAG), interacts with provider plugins via gRPC, and communicates with cloud APIs to provision infrastructure.
Now, to tie everything together, here’s a high-level diagram that illustrates Terraform’s end-to-end workflow, from reading your code to updating the state file:

Fig 4. Terraform workflow diagram
This workflow is at the heart of how Terraform delivers reliable, repeatable infrastructure automation across any cloud provider.
With this full picture in mind, you're well-equipped to not only write Terraform code but also understand how and why it works behind the scenes.
We have a team of experts who can help you streamline your infrastructure operations, SRE and Platform engineering initiatives. Reach out to us here
Have you ever wondered what happens when you run Terraform commands?
In this post, we will explore the following topics to gain a better understanding of Terraform internals:
Architecture and provider plugin architecture
Credential management
State locking
Debugging
Terraform architecture
Terraform uses a modular plugin-based architecture. This makes its core engine (written in Go) lightweight, while Terraform plugins do the heavy lifting of communicating with cloud provider APIs. These plugins are maintained by the respective cloud vendors e.g. AWS, GCP, etc.
But how does Terraform core communicate with these plugins?
gRPC: The bridge between core and plugins
Terraform core and provider plugins communicate through gRPC (Google Remote Procedure call). Its usage offers multiple advantages:
Performance: gRPC is designed for low-latency, bi-directional communication. This improves the performance of Terraform plan and apply commands.
Standardization: gRPC uses protocol buffers (protobuf) to serialize data. Protobuf defines the message schemas, while gRPC handles transport, making interactions reliable and efficient.
Separation of concerns: gRPC promotes language-neutral communication between Terraform core and plugins. This means that plugins can be written in languages other than Go.
Extensibility: Providers are binaries registered via gRPC and triggered by Terraform core. Adding a new provider is as simple as creating the plugin without touching the core engine.
This design keeps Terraform flexible and future-proof, letting the community build new providers without bloating the core.
The simplicity and modularity of Terraform plugin architecture is evident in cases where organizations have built Terraform providers beyond cloud infra use cases e.g. Datadog provider for setting alerts, Grafana provider managing Grafana operations, etc. Interestingly, some contributors have pushed the limits and even wrote providers to order Pizza. Check it out here.
The following diagram illustrates how Terraform core delegates cloud-specific tasks to provider plugins through gRPC.

Fig 1. Terraform core delegates cloud-specific tasks to provider plugins via gRPC.
Backend interface: Support storing Terraform state and handling operations like
apply
andplan
in a team setting.DAG builder: Terraform core manages resource dependencies by creating a Directed Acyclic Graph (DAG). This ensures that infrastructure operations are ordered and predictable while leveraging parallelism where possible.
gPRC client and Provider plugins: External binaries that interface with cloud specific APIs.
Let us look at the DAG builder workflow in more detail to understand how Terraform leverages dependency graphs to make infrastructure operations predictable and performant.
DAG builder workflow
Terraform core constructs a DAG to represent the relationships and dependencies between resources. This ensures resources are created, modified, or destroyed in the correct order. For example, a security group must be created before an EC2 instance that uses it. The DAG allows Terraform to:
Parallelize independent resource operations
Detect circular dependencies
Maintain correct resource creation order
Each node in the DAG is a resource and edges define dependency relationships.
DAG visual representation

Fig 2. DAG visual representation
We need to provision aws_instance.web
, which depends on the following resources:
aws_vpc.main
— the VPC where the instance will run.aws_security_group.web
— the security group to be attached to the instance.aws_iam_role.app
— the IAM role associated with the instance.
The DAG flow ensures that the pre-requisite infrastructure resources are created before the AWS instance.
This process leads to efficient and deterministic infrastructure provisioning.
What happens when you run Terraform apply?
Now that we have a basic overview of how Terraform’s architecture works, let’s look at what happens when you run the Terraform apply
command.
The 3 common commands that any Terraform user runs when provisioning infrastructure are:
init
- initialize Terraform - downloads provider plugins and does initial housekeeping tasks.plan
- dry run - describe the operation being performed.apply
- run the operation.
We will understand the apply
command in depth by using a simple aws_instance
resource example:
resource "aws_instance" "web" { ami = "ami-123456" instance_type = "t2.micro" }
Terraform uses HCL (Hashicorp configuration Language) to write Terraform code.
The above resource creation goes through these steps:

Fig 3. Terraform resource creation
Parsing the Configuration: Terraform Core reads your HCL file. Assuming the
init
call is done, the provider plugins are present to communicate with the provider. After this step, Terraform parses the resource, figures out the type (aws_instance
), the name (web
), and all the attributes you’ve set (ami
andinstance_type
).Building the Dependency Graph (DAG): Terraform automatically builds a DAG of resources, inferring dependencies sometimes even ones you didn’t explicitly declare. As seen earlier, each resource becomes a node, and dependencies form the edges.
Provider Interaction: The AWS provider steps in as it is registered during the
init
call. It takes your resource definition and translates it into an AWS specific API call. For an EC2 instance, this means preparing a request for the AWSRunInstances
API.Making the API Call: Using your credentials (from environment variables, config files, or elsewhere - we will touch upon this topic later), the provider sends a request like this.
POST https://ec2.amazonaws.com/?Action=RunInstances { "ImageId": "ami-123456", "InstanceType": "t2.micro", ...other default parameters... }
Handling the Response Terraform handles the contract between the API call and what it shows the end user. If the call goes through successfully, Terraform parses the response metadata and extract instance properties like instance_id, public and private IPs, and more.
Updating State Whenever any resource operation occurs, Terraform’s internal state changes. This change needs to preserved for the next run. Depending on where you maintain state (local or remote), Terraform updates all the relevant attributes in the state file under
aws_instance.web
. This state is crucial as it lets Terraform track what’s been created, detect drift, and reference attributes in other resources.
We assumed that Terraform can talk to the cloud provider through credentials stored somewhere. Let us take the example of AWS and see the different ways in which Terraform credentials are managed. Similar setups exist for other cloud providers.
Credential management
Terraform does not store credentials in the state file. They are passed securely at runtime using the following precedence:
Environment Variables:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
Credentials Files:
~/.aws/credentials
Other methods
Rather than storing permanent credentials on disk, there are more secure methods to obtain temporary credentials like AWS STS assume role.
Terraform State locking
Terraform can maintain state locally or remotely. Local state involves writing state to an on-disk file. This is not recommended as it keeps central cloud changes on someone’s laptop. This can be prevented by using a central remote storage like s3.
Using a central storage brings up another problem - What if multiple users of the same team update the state in parallel? This could corrupt the state leading to the last update overwriting any previous versions.
AWS S3 has a versioning feature which versions each change. However, this is useful for auditing who made the last changes but it doesn’t prevent parallel changes.
To address this challenge, Terraform has a feature to enable state locking through services like AWS DynamoDB.
Locking ensures that only one user can modify the state at any given time.
How does it work?
In DynamoDB, Terraform stores a lock with a unique
LockID
.When a Terraform operation (like
apply
ordestroy
) starts, it attempts to acquire the lock.If another operation is in progress and the lock is already held, Terraform will wait (and retry) until the lock is released.
Other cloud providers have DynamoDB equivalent e.g. GCP has Cloud storage object locks. The implementation details differ but the concept is the same - Whoever gets the lock, updates the state.
Example configuration
terraform { backend "s3" { bucket = "my-tf-state-bucket" key = "env/dev/terraform.tfstate" region = "us-east-1" dynamodb_table = "terraform-locks" encrypt = true } }
Benefits of using state locking
Prevents race conditions and state corruption.
Enforces serialisation of infrastructure changes.
This setup is critical for production environments or any setup where multiple users or automated systems are managing infrastructure via Terraform.
Debugging Terraform
As we wrap up this overview of Terraform architecture and core concepts, it's important to know how to troubleshoot when things don’t go as planned. That’s where Terraform debugging comes in.
Whether you're dealing with unexpected plan results, authentication errors, or strange provider behavior, Terraform’s detailed structured logging helps you understand the underlying issues.
TF_LOG Levels
Terraform exposes different logging levels you can use to control the verbosity of output during command execution:
TRACE
– Most detailed (everything).DEBUG
– Useful internal information (e.g., API calls, provider logic).INFO
– General progress messages.WARN
– Non-critical issues or warnings.ERROR
– Only errors that stop execution.
How to enable debug mode
You can set the log level with the TF_LOG
environment variable:
TF_LOG
To avoid cluttering your terminal or to keep an on-disk record of logs, use the TF_LOG_PATH
variable:
TF_LOG=DEBUG TF_LOG_PATH
What will you see in logs?
Terraform graph construction and execution.
Provider plugin handshake details.
Credential loading and authentication steps.
Cloud API requests and responses.
Backend configuration and state interactions.
End to end workflow diagram
Throughout this guide, we’ve explored how Terraform works under the hood - starting with how it parses .tf
configuration files, builds a dependency graph (DAG), interacts with provider plugins via gRPC, and communicates with cloud APIs to provision infrastructure.
Now, to tie everything together, here’s a high-level diagram that illustrates Terraform’s end-to-end workflow, from reading your code to updating the state file:

Fig 4. Terraform workflow diagram
This workflow is at the heart of how Terraform delivers reliable, repeatable infrastructure automation across any cloud provider.
With this full picture in mind, you're well-equipped to not only write Terraform code but also understand how and why it works behind the scenes.
We have a team of experts who can help you streamline your infrastructure operations, SRE and Platform engineering initiatives. Reach out to us here