Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Platform Engineering] Blog #1 "Pulumi Patterns and Practices (P3): A Pulumi-based reference architecture for large-scale organizations" #12414

Merged
merged 5 commits into from
Aug 5, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 119 additions & 0 deletions content/blog/platypus-platform-pulumi-at-1000-nodes/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
title: "The Platypus Platform: Pulumi for large-scale organizations"

date: 2024-08-05
draft: false
social_media: "TBD"
meta_desc: "The Platypus Platform is a comprehensive Pulumi-based internal platform for infrastructure management and secure deployments in a large-scale environment."
meta_image: meta.png
authors:
- troy-howard
tags:
- platform-engineering
- platypus-platform
- devsecops
- architecture
---

Infrastructure management is all fun and games until you find yourself scrolling through 1000+ resources in your AWS console. Worse, when one rogue product team wants to use Azure and your data team wants to be on GCP, you're ARM wrestling in Azure and watching your economies of scale tip the wrong direction as you're copy-pasting CloudFormation templates into yet another git repo... This. Needs. To. Be. A. Platform!
thoward marked this conversation as resolved.
Show resolved Hide resolved

<!--more-->

And in that moment of overwhelm, you will be sold to, nurture-emailed every week, and told all your problems will be solved by implementing an IDP (internal developer portal, as if you've never seen this acronym before). An IDP that costs a lot of money and a lot of time to implement beyond default settings. An IDP that really only solves half of your problems. Your internal team offers to build something... something that feels more like welding together random pieces of code into an abstract found-art sculpture built from junkyard refuse, already 5 years out of date. How long will this investment be useful before you have to start over?
thoward marked this conversation as resolved.
Show resolved Hide resolved
thoward marked this conversation as resolved.
Show resolved Hide resolved

It's exhausting. If there was a good solution on the market, you wouldn't be reading this article. So let's talk about what you really need, and how Pulumi can help.

## An effective internal developer platform

There are quite a few listicles out there professing to authoritatively tell you the 5, or 7, or 11 essential components of an internal developer platform. Personally, I trust our customers to tell us, and here's what they have said they need:
thoward marked this conversation as resolved.
Show resolved Hide resolved
thoward marked this conversation as resolved.
Show resolved Hide resolved

**Consistency:** Bring some order to the chaos. As your company and your infrastructure grows, it gets more and more complicated to maintain consistency. You might already have established design patterns that you want to replicate, but don't have any way to encode those practices in your current tools. There's a lot of copy/paste of reusable blocks, but no way to apply [DRY principles](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) or to modularize/templatize the important parts (hint: all the parts are important!).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this amazing list...

I think we should either update this https://www.pulumi.com/what-is/what-is-platform-engineering/ or create a brand new page called "What is an internal developer platform". I like your points, they are slightly similar to the requirements of that page, but more colloquial in how they are written.


**Reproducibility:** Repeatable behaviors, who dat? If you run your deploy twice do you get the same results each time? What if you replicate your production environment to create a test environment, are they actually identical? How much more work does it take to get them to be? Will you get the same version of the training dataset every time you run your AI workloads? It’s anyone's guess. A lack of reproducibility slows down development, makes debugging more difficult, and makes that reuse we just talked about harder to achieve.

**Visibility:** When your node count, and user count starts to go beyond about 50-100 resources (computing or human) you quickly run into a problem of visibility. It can be very difficult to get a handle on what's happening, how many resources you have, where they are, and how much they cost. Any system that purports to be able to manage 1000 nodes or more must have deeply integrated analytics, dashboards, charts, and be searchable, across all your clouds, all your users, and every kind of resource.

**Security and Compliance:** Good fences make good neighbors. RBAC, policy-as-code, excellent secrets management, integration with your existing identity providers... These are the things you need to build security and policy guardrails you can rely on. Without them? It's just a powder keg of liability waiting to catch a spark.
thoward marked this conversation as resolved.
Show resolved Hide resolved

**Auditability:** What happened and who did it? This is like a high-stakes game of [Clue](https://en.wikipedia.org/wiki/Cluedo). How quickly can you figure out who ran that bad deployment? Was it *Colonel Mustard* in the *library* with the *candlestick*? Or Blake the new Front-End Developer with overly-broad permissions in AWS? Being able to answer these questions needs to happen quickly. Quickly, like minutes, not hours or days. And it might have happened 6 months ago. Oof.

**Developer Experience:** In the ideal world, developers drive their own DevOps. The platform team provides self-service tools and streamlined workflows that allow your engineers to provision new resources, so your team doesn't have to. And you know, if the developers don't like the user experience, they won't use it at all, and will invent their own tools. You will have ROGUE SYSTEMS to hunt down and argue against in tedious overly-technical meetings. This is not what you want. We need to keep the developers happy to prevent this.

## A holistic view of the Platypus Platform

Pulumi has a broad surface area of [products and features](https://www.pulumi.com/product/) that address these needs. Designed with integration in mind from the beginning, our tools orchestrate well, presenting a smooth and streamlined workflow for both operations teams and developer teams.

We have an idea of how you can use all the Pulumi products together to deliver a comprehensive internal platform for security, infrastructure management, and deployments. Call it an internal platform for developer platform engineers (IPfDPE), if you want. We call it the realization of a vision we've been working hard to build for many years.

The **Platypus Platform** is a reference architecture that we will be describing, and providing code for, through this series of articles. We'll be diving deep into not just what you can do with our tools, but how to do it, and provide code for a reference implementation that you can use to jump start the process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just call it the "Platyform" as shorthand? otherwise Platypus Platform gets a bit long in how its written

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i decided to just called it Platypus, and refer to it as a "reference architecture for a Pulumi-based internal developer platform".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming seems to still be an open question here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with Pulumi Patterns and Practices (P3).


Here's a quick overview to give you an idea of how we'll be addressing those needs in the Platypus Platform.

### Consistency

Pulumi can help bring consistency to your software catalog by encoding design patterns into reusable *[component resources](https://www.pulumi.com/learn/abstraction-encapsulation/component-resources/)* and by building custom *[organization templates](https://www.pulumi.com/docs/pulumi-cloud/developer-portals/templates/)* that provide a no-code or low-code way to start a new project. Templates help get projects off the ground faster and ensure consistent code structure, policy compliance, and best practices.

<figure>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like video of an older version of NPW.
So you'll want to use an updated video.

{{< video title="The New Project Wizard in Pulumi Cloud" src="https://www.pulumi.com/uploads/npw.mp4" controls="false" autoplay="true" loop="true" >}}
<figcaption><p>Figure: An internal developer portal using custom templates in Pulumi Cloud</p></figcaption>
</figure>

Beyond that, because Pulumi is [multi-cloud](https://www.pulumi.com/blog/deploy-to-multiple-regions/) (AWS, Azure, Google Cloud, and more) and [multi-language](https://www.pulumi.com/blog/pulumiup-pulumi-packages-multi-language-components/) (JavaScript, Python, Go, C#, Java) you can enjoy the same consistency across all your environments and all your developer teams, regardless of the languages they prefer, or cloud tooling they need.

Another core aspect of consistency is *[drift detection](https://www.pulumi.com/docs/pulumi-cloud/deployments/drift/)*. Pulumi automatically detects and remediates cloud resources that have deviated from the expected state stored in Pulumi Cloud. This tech is better than ibuprofen at getting rid of developer-created headaches.

### Reproducibility

Since 2010, scientists have felt that we are in a crisis – a *[reproducibility crisis](https://en.wikipedia.org/wiki/Replication_crisis)* – wherein we cannot easily reproduce an experiment in order to verify published results. Similarly, the software industry is entering into a reproducibility crisis of its own, especially around AI training workflows, where it is increasingly difficult to recreate crucial build and prod environments. [Pulumi Stacks](https://www.pulumi.com/learn/building-with-pulumi/understanding-stacks/) make it very easy to manage both configuration and state across multiple environments, and make [reproducing a deployment](https://www.pulumi.com/blog/simple-reproducible-kubernetes-deployments/) a matter of a few clicks within Pulumi Cloud.
thoward marked this conversation as resolved.
Show resolved Hide resolved

You can use Pulumi programs to capture ***all*** of the necessary resources for an AI training workload, including things like [versioned data](https://www.pulumi.com/ai/answers/xig35anR7ibjAP5MhHDyxC/time-travel-queries-on-snowflake-dynamic-tables) using dynamic tables with time-travel functionality in [Snowflake](https://www.pulumi.com/case-studies/snowflake/). That means you can be sure that not only will your deployment be on the infrastructure you need, it will also have the exact version of data, every time, which is essential to A/B testing and debugging your models.

### Visibility

Every resource under management by Pulumi is visible within [Pulumi Insights](https://www.pulumi.com/product/pulumi-insights/). From this single-pane-of-glass interface, you can search for resources across all cloud environments. [Pulumi Copilot](https://www.pulumi.com/product/copilot/) provides a state-of-the-art AI chat interface to ask complex questions and get immediate results. Pulumi Insight's analytics gives you the ability to identify anomalies or trends in resource usage and dig into cost, security, and compliance concerns.

{{< figure src="https://www.pulumi.com/uploads/pulumi-insights-search.gif" caption="Figure: Search for any resource with Pulumi Insights">}}

### Security and Compliance

In the modern parlance, when you say DevOps, you mean DevSecOps. Pulumi is designed to be secure by default. Pulumi Cloud offers full [role-based access control (RBAC) functionality](https://www.pulumi.com/docs/pulumi-cloud/access-management/teams/) including deep integration with [GitHub teams](https://www.pulumi.com/docs/pulumi-cloud/access-management/teams/#github-based-teams) and [SAML-based SSO](https://www.pulumi.com/docs/pulumi-cloud/access-management/saml/), managed secrets and flexibly-defined secure environments with [Pulumi ESC](https://www.pulumi.com/product/esc/), and policy-as-code provided by [Pulumi Crossguard](https://www.pulumi.com/crossguard/). Most importantly all of these features are deeply integrated across the platform, creating an air-tight system with all the guardrails you need for managing security and access.

### Auditability

Every action a user takes in Pulumi can be tracked via the [audit log](https://www.pulumi.com/docs/pulumi-cloud/audit-logs/) which is searchable in two clicks from the Pulumi Cloud homepage dashboard. Audit logs can be filtered by user with one more click. Creating automated backups of your audit logs is a [first-class feature](https://www.pulumi.com/docs/pulumi-cloud/audit-logs/#automated-export). You will never have to worry about responding quickly when someone asks about an event that happened in your system. Also, each deployment and update has logs directly visible from the Pulumi Cloud app, regardless of how it was initiated.

{{< figure src="/images/docs/guides/self-hosted/auditlogs.png" caption="Figure: Viewing the audit log in Pulumi Cloud">}}

### Developer Experience

Probably the most compelling aspect of Pulumi is the developer experience. [Developers love Pulumi](https://www.pulumi.com/testimonials/), because they get to use their preferred tools. General purpose programming languages, visual IDEs, command-line tools, and products with an API-driven architecture are what developers want, and it's what Pulumi delivers in spades.

With Pulumi templates and custom internal component resources in place, developers can drive their own DevOps, provisioning their own infrastructure resources and managing their own deployments directly, reducing bottlenecks in platform teams. Product engineering teams can self-service with a stream-lined workflow that stays compliant with company policy by default. Deep in the code of their favorite programming languages, your developers will never even know they are following the company rules.

{{< figure src="pulumi-ide.png" caption="Figure: Using C# to write a Pulumi program in VS Code">}}

### More to Come
thoward marked this conversation as resolved.
Show resolved Hide resolved

So now that we’ve made a case for how Pulumi can be applied to meet the most pressing needs of a larger organization, hopefully you will realize that the Platypus Platform we will be presenting is more than just infrastructure-as-code. Pulumi is a platform for teams, where your developer portal is not just a catalog of software, but a fully functional control-plane across all your cloud environments.
thoward marked this conversation as resolved.
Show resolved Hide resolved

Stay tuned for the following series of posts where we will use Pulumi to implement the Platypus Platform reference architecture for a fully-featured internal developer portal (IDP.. or IPfDPE if you prefer).
thoward marked this conversation as resolved.
Show resolved Hide resolved

And if you are already ready to get your hands on Pulumi after this introduction, feel free to [create an account](https://www.pulumi.com/signup/) and follow some of our [Getting Started](https://www.pulumi.com/docs/get-started/) guides to see how easy simple use cases are and begin to imagine how that same developer experience will scale up to your entire organization.

To learn more, you can watch the following video which provides a high level overview of how Pulumi works:

<div class="rounded-md shadow border border-gray-300 w-3/4 mx-auto my-4" style="position: relative; padding-bottom: 40.25%; height: 0; overflow: hidden;">
<iframe
src="//www.youtube.com/embed/Q8tw6YTD3ac?rel=0"
style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;"
allowfullscreen=""
title="Introduction to Pulumi in Three Minutes"
srcdoc="<style>*{padding:0;margin:0;overflow:hidden}html,body{height:100%}img{position:absolute;width:100%;top:0;bottom:0;margin:auto}</style><a href=https://www.youtube.com/embed/Q8tw6YTD3ac?autoplay=1><img src='/images/home/youtube-getting-started.png' alt='Introduction to Pulumi in Three Minutes'></a>">
</iframe>
</div>

## Pulumi Cloud

The Pulumi Cloud is a fully managed service that helps you adopt Pulumi’s open source SDK with ease. It provides built-in state and secrets management, integrates with source control and CI/CD, and offers a web console and API that make it easier to visualize and manage infrastructure. It is free for individual use, with features available for teams.

<a class="btn btn-secondary" href="https://app.pulumi.com/signup" target="_blank">Create an Account</a>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading