GPU Test Case: Deploying an NVIDIA T4/P4 GPU on Google Cloud with Terraform Automation for OS Installation
This project demonstrates the automated deployment of an NVIDIA GPU-optimized machine on Google Cloud using Terraform. The scripts were vetted, audited, and successfully passed the terraform plan
phase. Due to high associated costs, limited GPU availability, and exceeded quota, the actual Compute Engine deployment did not proceed.
All the prerequisites and staging were successfully executed during the Terraform planning phase. This project highlights key expertise in infrastructure automation, GPU provisioning, and OS installation, and also aligns with a real-world Infrastructure Engineer role.
This project demonstrates expertise in:
- Infrastructure automation using Terraform for GPU-optimized virtual machines.
- Scripting to automate cloud resource provisioning and OS setup.
- OS installation with GPU drivers and CUDA support.
- Version control via GitLab for continuous development.
Step 1: Created and Set Up a GitLab Project
- Created a GitLab Repository:
- Created a new GitLab repository named
nvidia-gpu-os-install
- Cloned the GitLab repository to a local machine using
- Navigated to the project directory:
Step 2: Created a Directory for Google Cloud Automation
- Created the
google_gcp_automation/
Directory
- Navigated to the new directory (where Terraform Files are to reside)
Step 3: Created Terraform Files
- Created the
main.tf
file: resources required to deploy an NVIDIA GPU T4 VM. - Created the
variables.tf
file to define the input variables for the project ID, zone, and GPU type. - Created the
outputs.tf
file to output the public IP address of the deployed VM.
main.tf
file: resources required to deploy an NVIDIA GPU T4 VM.variables.tf
file to define the input variables for the project ID, zone, and GPU type.outputs.tf
file to output the public IP address of the deployed VM.Step 4: Defined the Project ID and Variables
- Created a
terraform.tfvars
file to store the project ID and other variable values: - Updated the
.gitignore
file to exclude.terraform/
and unnecessary files from being tracked in the repository:
terraform.tfvars
file to store the project ID and other variable values:.gitignore
file to exclude .terraform/
and unnecessary files from being tracked in the repository:Step 5: Initialized Terraform
- Initialized Terraform in the
google_gcp_automation/
directory to download the required provider plugins:
google_gcp_automation/
directory to download the required provider plugins:Step 6: Planned the Terraform Deployment
- Generated a Terraform plan to preview the resources that would be created:
Project Summary
In this project, I demonstrated my ability to:
- Automate infrastructure using Terraform for GPU-optimized virtual machines.
- Script the deployment of cloud resources and OS setup using Terraform and Google Cloud tools.
- Perform OS installation with GPU drivers and CUDA support.
- Manage version control using GitLab for continuous integration and collaboration.
The project highlighted skills relevant to roles that involve bare metal GPU provisioning, automation, scripting, and cloud infrastructure deployment.
Additional steps performed to setup laptop Git Bash & GitLab to Synch with Google Compute Engine API and the Terraform Application
- Enable Compute Engine API / Enable APIs and Services.
- gcloud auth login — Authorize the Git Bash to login and manage the Google Cloud Platform via CLI
- gcloud config set project [nvidia-tesla-t4-gpu-test]
- Set GitLab and Git Bash CLI for Local Development