DevOps Project Intake Checklist

Project Name

Date

Project Owner / Sponsor

DevOps Lead

Target Go-Live

Priority

Ticket / JIRA

Business Unit

01

Project Context & Stakeholders

Who owns this, what does it do, who approves things

0 / 8

Purpose: Establish shared understanding before any infrastructure work begins. Misaligned expectations here cause the most expensive rework.

What is this project? STANDARD

High-level description: web app, data pipeline, ML model, internal tool, API service, etc.

Answer / Notes

Who is the business owner / sponsor? CRITICAL

The person accountable for budget and go/no-go decisions. Must be a named individual, not a team.

Answer / Notes

Who is the application-side technical lead?

Primary engineering contact for app-level decisions and architecture questions.

Answer / Notes

Who approves infrastructure changes in production? CRITICAL

CAB process, named individual, or approval ticket workflow?

Answer / Notes

What are the SLA / uptime requirements? CRITICAL

e.g. 99.9% uptime, RTO < 1h, RPO < 15min. Drives HA topology, replica count, and backup strategy.

Answer / Notes

What is the expected user base and traffic profile?

Internal only or external? Concurrent users? Peak load windows? Seasonal spikes?

Answer / Notes

What is the environment timeline?

When is dev needed? Staging? Production go-live? Any hard deadlines?

Answer / Notes

Are there any known dependencies on other teams or systems?

Shared databases, downstream APIs, external vendors, or blocked by another project?

Answer / Notes

02

Kubernetes & Cluster Access

We do not own the cluster — all access must be requested and approved

0 / 11

Important: We do not own the Kubernetes cluster. All namespace provisioning, RBAC, resource quotas, and network policies require approval from Aaron or Enterprise Ops. Do NOT assume access — request it early. Delays here block everything else.

Which cluster will this project run on? CRITICAL

Get the cluster name, API server URL, and environment (dev/staging/prod). Confirm it exists.

Answer / Notes

Who do we contact to request namespace creation? CRITICAL

Aaron? Enterprise Ops team? Ticket queue? Get the exact process and expected turnaround time.

Answer / Notes

Has a namespace been requested and approved? CRITICAL

Namespace name, cluster, environment. Do not begin CI/CD setup until this is confirmed in writing.

Answer / Notes

What RBAC roles are we being granted? IMPORTANT

view / edit / admin? Cluster-scoped or namespace-scoped? Who grants it and how?

Answer / Notes

What resource quotas are applied to our namespace? IMPORTANT

CPU limits, memory limits, PVC storage, number of pods/services. Request increases before dev starts.

Answer / Notes

What Kubernetes version is the cluster running?

Affects API compatibility. Validate against Helm charts, CRDs, and operator versions.

Answer / Notes

What ingress controller is available? IMPORTANT

nginx, Traefik, ALB Ingress? Who manages it? Do we get a subdomain automatically?

Answer / Notes

Is there a service mesh? (Istio, Linkerd)

Affects mTLS, traffic routing, observability. If yes, are sidecars injected automatically?

Answer / Notes

Is there an internal container registry?

Harbor, ECR, Artifactory? Get push credentials before CI/CD setup begins.

Answer / Notes

Are there existing cluster-wide admission controllers? IMPORTANT

OPA/Gatekeeper, Kyverno? These may silently block deployments if policies are not met.

Answer / Notes

What is the node pool / node selector strategy?

Spot vs on-demand? GPU nodes? Taints/tolerations we need to configure?

Answer / Notes

03

Data Classification & Compliance

Determines cluster tier, encryption, audit obligations, and backup retention

0 / 9

CRITICAL: Data classification determines EVERYTHING — which cluster tier, encryption requirements, audit logging, network isolation, and backup retention period. Get this answered in the first meeting. Wrong classification = compliance failure.

What is the data classification level? CRITICAL

Public / Internal / Confidential / Restricted / PII / PHI / PCI / ITAR / CUI?

Answer / Notes

Does this handle PII (Personally Identifiable Information)? CRITICAL

Names, emails, SSNs, IP addresses. Triggers GDPR / CCPA / state privacy law requirements.

Answer / Notes

Does this handle PHI (Protected Health Information)? CRITICAL

Medical records, diagnoses. Triggers HIPAA — dedicated cluster and BAA may be required.

Answer / Notes

Does this handle payment card data (PCI DSS)? CRITICAL

Card numbers, CVVs. Requires PCI-compliant cluster and strict network segmentation.

Answer / Notes

Does this handle classified / export-controlled data (ITAR / CUI)? CRITICAL

US government data. May require FedRAMP authorization or dedicated govcloud deployment.

Answer / Notes

What compliance frameworks apply? IMPORTANT

SOC 2, ISO 27001, NIST 800-53, HIPAA, FedRAMP, GDPR, CCPA?

Answer / Notes

Is encryption at rest required? IMPORTANT

LUKS, etcd encryption, encrypted PVCs, KMS-managed keys?

Answer / Notes

What audit logging is required? IMPORTANT

Who needs access to logs? How long must they be retained? (30d / 1yr / 7yr)

Answer / Notes

Are there data residency requirements? IMPORTANT

Data must stay in a specific country or region? Affects cloud region and cross-region backup strategy.

Answer / Notes

Data Classification → Infrastructure Tier Reference

Classification	Cluster Tier	Encryption	Audit Logs	Backup Retention
Public / Internal	Shared cluster	TLS only	30 days	7 days
Confidential / PII	Isolated namespace	TLS + at-rest	90 days	30 days
PHI / PCI / HIPAA	Dedicated cluster	Full + KMS	1 year	7 years
ITAR / CUI / FedRAMP	Gov-dedicated	FIPS 140-2	7 years	7 years

04

CI/CD & Source Control

Pipeline toolchain, registry, deployment strategy, and secret management

0 / 9

Where is the source code hosted?

GitLab, GitHub, Bitbucket? Self-hosted or cloud? Group/org name and repo URL?

Answer / Notes

What CI/CD platform will we use? IMPORTANT

GitLab CI, GitHub Actions, Jenkins, ArgoCD, Tekton? Who manages the runners?

Answer / Notes

What is the deployment strategy? IMPORTANT

Rolling update, blue/green, canary? Who approves production deployments?

Answer / Notes

Do we use GitOps? (ArgoCD / Flux)

Is there an existing GitOps config repo? Who has push access to it?

Answer / Notes

What container registry will be used? IMPORTANT

Harbor, ECR, GCR, Artifactory? Push credentials needed before pipeline setup.

Answer / Notes

Will we use Helm charts or raw manifests?

Existing chart? Custom? Values files per environment? Chart repository location?

Answer / Notes

What environments need pipelines?

dev / staging / prod? Manual approval gates between stages?

Answer / Notes

What secret management solution is in use? CRITICAL

HashiCorp Vault, AWS Secrets Manager, Kubernetes Secrets, Sealed Secrets, External Secrets Operator?

Answer / Notes

Are there SAST / DAST / SCA scanning requirements? IMPORTANT

Checkmarx, Snyk, Trivy, OWASP ZAP? Must results pass before deploy?

Answer / Notes

05

Networking & Security

Ingress, egress, firewall rules, TLS, and zero-trust posture

0 / 9

What domains / subdomains does this project need? IMPORTANT

Who manages DNS? Route 53, Cloudflare, internal DNS? How long does provisioning take?

Answer / Notes

Who issues TLS certificates? IMPORTANT

cert-manager + Let's Encrypt, internal CA, purchased wildcard cert?

Answer / Notes

Does the app need outbound internet access?

If yes, what destinations? Proxy required? Egress firewall rules needed?

Answer / Notes

Does the app need to reach internal services / databases?

VPC peering, PrivateLink, VPN tunnel, service endpoints? Who requests connectivity?

Answer / Notes

Are NetworkPolicies required in the namespace? IMPORTANT

Default deny-all? Which pods can communicate with which? Who approves exceptions?

Answer / Notes

Is a WAF (Web Application Firewall) required?

AWS WAF, Cloudflare, ModSecurity? Who manages the rule sets?

Answer / Notes

What authentication method does the app use?

OAuth2, OIDC, SAML, LDAP, API keys? SSO / enterprise IdP integration required?

Answer / Notes

Is image signing required? IMPORTANT

Cosign, Notary, SBOM generation? Who signs and who verifies at deploy time?

Answer / Notes

Are there vulnerability scanning requirements for running containers?

Runtime scanning (Falco, Aqua, Prisma)? What happens on a critical CVE in prod?

Answer / Notes

06

Storage & Databases

Persistent volumes, databases, object storage, and retention policies

0 / 6

Does the app need persistent storage? IMPORTANT

PVCs? Which StorageClass? RWO vs RWX? Initial size and growth estimate?

Answer / Notes

What database(s) does this project use? CRITICAL

PostgreSQL, MySQL, MongoDB, Redis, Elasticsearch? Managed service or self-hosted in cluster?

Answer / Notes

Who manages the database? IMPORTANT

Provisioning, patching, backups, failover — platform team or app team? Clearly assign ownership.

Answer / Notes

Is object storage needed?

S3, MinIO, GCS, Azure Blob? Bucket naming, lifecycle policies, access control?

Answer / Notes

What are the data retention requirements?

How long must data be kept? Hot/warm/cold tiers? Legal hold requirements?

Answer / Notes

What is the expected data volume and growth rate?

Current size? Monthly growth rate? Used for storage class selection and cost projection.

Answer / Notes

07

Backup & Disaster Recovery

RTO, RPO, backup schedule, and tested restore procedures

0 / 9

CRITICAL: A backup strategy with no tested restore is not a backup strategy. Restore tests must be scheduled, executed, and documented before go-live. No exceptions.

What is the Recovery Time Objective (RTO)? CRITICAL

How long can the system be down before it becomes a business-impacting problem?

Answer / Notes

What is the Recovery Point Objective (RPO)? CRITICAL

How much data loss is acceptable? RPO = 0 requires synchronous replication.

Answer / Notes

What needs to be backed up? CRITICAL

Databases, PVCs, config maps, secrets, application state, object storage, Helm values?

Answer / Notes

What backup tool will be used? IMPORTANT

Velero (K8s), pg_dump, mysqldump, RDS snapshots, AWS Backup, Kasten K10?

Answer / Notes

What is the backup schedule? IMPORTANT

Hourly / daily / weekly? Incremental or full? Retention per tier (7d / 30d / 1yr)?

Answer / Notes

Where are backups stored? IMPORTANT

Different region? Different cloud account? Air-gapped? Encrypted at rest?

Answer / Notes

Has a restore procedure been documented and tested? CRITICAL

Full restore test must be completed and signed off before production go-live.

Answer / Notes

Is multi-region or multi-AZ deployment required?

Active-active, active-passive, or single-region with cross-region backup?

Answer / Notes

Is there a DR runbook? IMPORTANT

Who declares a DR event? Who executes the runbook? Is the contact tree documented?

Answer / Notes

08

Observability & Monitoring

Metrics, logs, traces, alerts, and on-call rotation

0 / 7

What monitoring stack is available? IMPORTANT

Prometheus + Grafana, Datadog, New Relic, CloudWatch, Dynatrace?

Answer / Notes

What logging stack is in use? IMPORTANT

ELK, Loki + Grafana, Splunk, CloudWatch Logs? How do developers query logs?

Answer / Notes

Is distributed tracing required?

Jaeger, Zipkin, AWS X-Ray, Datadog APM? OpenTelemetry instrumentation needed?

Answer / Notes

Who receives alerts and what is the on-call rotation? CRITICAL

PagerDuty, OpsGenie, Slack? Who is primary? Who is escalation? After-hours coverage?

Answer / Notes

What are the critical alert thresholds? IMPORTANT

CPU > X%, error rate > Y%, p99 latency > Zms, disk > W%?

Answer / Notes

Are SLOs / SLIs defined?

Error budget? Who owns the SLO dashboard? Who reviews it and at what cadence?

Answer / Notes

How long must logs be retained?

30 days hot + archive? Compliance-driven retention rules apply here too.

Answer / Notes

09

Cost & Resource Planning

Budget, tagging, capacity estimates, and autoscaling

0 / 6

What is the monthly infrastructure budget? IMPORTANT

Hard limit or soft target? Who approves overages? Who gets cost alerts?

Answer / Notes

What cost center / cloud account does this bill to?

AWS account ID, GCP project, Azure subscription? Mandatory cost tags required?

Answer / Notes

What resource requests/limits should pods start with?

CPU request/limit and memory request/limit per container. Right-size from load test results.

Answer / Notes

Is autoscaling required?

HPA (CPU/memory), KEDA (event-driven), VPA, Cluster Autoscaler? Min/max replicas?

Answer / Notes

Are spot / preemptible instances acceptable?

Acceptable for non-critical workloads. Not appropriate for stateful or latency-sensitive services.

Answer / Notes

Has a cost estimate been run for steady-state and peak?

Run in Infracost or the cloud pricing calculator before provisioning begins.

Answer / Notes

10

Handoff, Documentation & Runbooks

Ensuring the system can be operated and handed off without the original author

0 / 6

Is there an architecture diagram? IMPORTANT

Draw.io, Lucidchart, C4 model. Must reflect actual deployed state, not aspirational design.

Answer / Notes

Is there a runbook for common operational tasks? IMPORTANT

Restart service, scale up/down, rotate secrets, trigger manual backup, check health.

Answer / Notes

Where is all documentation stored? IMPORTANT

Confluence, GitLab Wiki, GitHub Wiki? Link must be in the repo README.

Answer / Notes

Is there a defined on-call handoff process?

Who is on-call at launch? How are escalations handled after hours?

Answer / Notes

Has a go-live checklist been completed and signed off? CRITICAL

Infrastructure, security, monitoring, backup, and access all verified before launch.

Answer / Notes

Is there an incident response process defined?

Post-mortem template, severity levels, communication plan, RCA timeline.

Answer / Notes

✓

Sign-Off & Approval

This checklist must be signed before infrastructure provisioning begins

Note: All six roles must sign. This document is a living artifact — re-sign when scope changes or when promoting between environments (dev → staging → prod).

DevOps / Platform Lead

Signature / Date

Business Owner / Sponsor

Signature / Date

Security / Compliance Lead

Signature / Date

Infra / Cluster Admin (Aaron)

Signature / Date

Application Engineering Lead

Signature / Date

Enterprise / Program Manager

Signature / Date

New Project IntakeChecklist

New Project Intake
Checklist