The Intersection of ClusterAPI and Infrastructure as Code
Cluster API is a Kubernetes project aimed at bringing Kubernetes-style declarative APIs to cluster lifecycle management. Pulumi aims at enabling developers and other professionals to leverage the power of general-purpose programming languages to declaratively define infrastructure-as-code, policy-as-code, and more. What happens when these two technologies are combined?
- Scott LoweStaff Architect, VMware
Show video transcript
Hi everyone. My name is Scott and I am going to be talking for the next little bit about exploring the intersection of cluster A-P-I and infrastructure as code and what it might look like when you want to combine these two technologies together. So I hope that you find this session useful and interesting. And I hope that I’m able to share something that’s, you know, new that you haven’t seen before. So let’s get started. Here we go. All right. A quick blurb about me. I do strive to be a lifelong learner always looking at learning new things, which is one of the things that led me into Pulumi.
I was looking for ways—practical ways to help expand my programming knowledge, which I’m still a newbie yet. So don’t laugh at my code. And I found that using Pulumi and writing general-purpose programming code for managing my infrastructure as code, was a nice use-case that kind of helped me get a little deeper in some of the programming languages. So obviously I am a Pulumi user. I started with Typescript and then move to Go. A little secret I’ll share with you. The reason I moved from Typescript to Go was that there was a change in the Pulumi S-D-K. That required a Typescript, started using Promises and a-sync stuff and I totally didn’t and still don’t understand all of it and couldn’t make my code work. So I was like, okay fine.
I’ll just switch to Go. I wanted to learn more— Golang anyway. So worked out. I do work at VMware, came in via the Heptio acquisition, and my job there is to help folks with kubernetes, stand up kubernetes, optimize their communities environments, that sort of thing. And sort of related to that— that means that I’m a big fan of cluster A-P-I. And I’ll talk more about what that is in just a moment. All right. So speaking of cluster A-P-I. What is cluster A-P-I? It is a project. It’s led by SIG Cluster Lifecycle. It’s a project to bring declarative kubernetes style A-P-Is to Cluster Lifecycle management.
So much in the same way that you would use a declarative A-P-I to say I want to run this container image and I want to have this many replicas of it and I want you to expose it on this port and then kubernetes goes and uses its core reconciliation loop to ensure that what you’ve asked to do is actually you know, what’s happening right to reconcile desired state and actual State. The idea is we can use cluster A-P-I to bring the same style of declarative A-P-Is to managing Cluster Lifecycle.
So we can say I want there to be a cluster and I want it to have three control plane nodes and I want it to have a machine deployment that I can scale for worker nodes and so on so forth, right? And then we apply those definitions, you know, stored on a YAML manifest, that declarative state we apply that to what is known as a management cluster and that’s a cluster that has all of the cluster A-P-I components and controllers and C-R-Ds and such installed. And then again through that core reconciliation loop, that management cluster then realizes the desired state of saying that a cluster exists and exists in this configuration right? Cluster A-P-I was written in a way that allows you to use different I-S providers.
So there’s an I-S— there’s a provider, a cluster A-P-I provider for A-W-S, a cluster A-P-I provider for vSphere, for Azure, etc. and so forth. And as if it wasn’t confusing enough, we have cluster A-P-I, which we refer to as Cappy, and then the providers are the cluster A-P-I provider for A-W-S. So it’s called CAP-A and then cluster A-P-I provider for Cap— for vSphere. CAP-V, cluster A-P-I provider for Azure, CAP-Z so on, so forth. Normally when cluster A-P-I interacts with these I-S platforms like A-W-S or Azure or whatever, it will go and it will create all of the necessary infrastructure that you need. So you’ll give it a manifest.
That manifest will define what the kubernetes infrastructure needs to look like, and then the provider knows what it has to create underneath that to support said kubernetes cluster. So on A-W-S, which is what I’ll be using today to show off how some of this stuff works. It would go and it would create a V-P-C and subnets and you know gateways and route tables and all the necessary jazz. And it’ll just do all that for you. And so the idea is that, you know, a user could then go and not have to worry about managing infrastructure. They can just do it all through cluster A-P-I, however, for a variety of reasons customers may want to consume their own infrastructure. They may already have existing A-W-S structures that they want to use.
They want to have cluster A-P-I simple to use those instead of creating new ones. And so there is the model for supporting what we call bring your own infrastructure, right? And which would allow you to say, well here I already have a V-P-C and subnets. I want cluster A-P-I to use those instead of creating its own. And I’ll show you what that looks like. And in fact, that’s going to be, you know, a key sort of part of the entire presentation is how we can use a cluster A-P-I manifest or— or cluster A-P-I itself with infrastructure created using Pulumi for infrastructure as code. And we’ll look at the different ways to do that.
If you’re interested in more information about cluster A-P-I itself, this is just a simple high-level overview, go to the custom A-P-I homepage at Cluster Dash Dot A-P-I Dot Sigs Dot K-8s dot I-O or check out the GitHub repository there on the screen. Now, I want to show you real quick before I go on what it looks like to see a cluster A-P-I manifest. So let me switch to my demo screen here. Okay, here we go. And I’m just going to—. I have— here we go. A YAML manifest. This is a cluster A-P-I definition. This is a complete definition that will create an entirely independent kubernetes cluster.
And so you can see we have these custom resource definitions that cluster A-P-I uses, things like cluster and A-W-S cluster and kube A-D-M control plane, and we have you know, various fields that we configure that, what region is going to be in for A-W-S? What S-S-H key it’s going to use? So on so forth. We can specify replicas, inversions and so on, so forth. And we can map that down to specific instance types. So we could have you know, the control plane in this case, use C-E-S demo control plane is the name of this object and I’m mapping it to a T-3 large, because this is just a demo environment, but I can map it to you know, an M-5, you know, X-large or whatever here. And that gives you an idea of what’s going on, right? So now let’s flip back over to the presentation. There we go.
With that in mind then, you know, what does it look like to have cluster A-P-I use existing infrastructure what we call this B-Y-O-I. Bring your own infrastructure model. So users can create their own infrastructure. They can use an infrastructure as code tool like Pulumi. They can create all the necessary pieces that are there and use all the best practices that they would want to use for infrastructure as code, right? And then you can integrate that. So, you know, that third bullet there on the screen, you know, is it possible to use Ia–C for B-Y-O-I with CAPI? Yes. Absolutely.
You can use infrastructure as code to manage the infrastructure that you are bringing into a cluster A-P-I environment. The answer to that fourth question, whether I can use even more acronyms than that in a single sentence, we’ll have to explore in some other session. Okay. Alright, so, let’s see what we got here. What information does cluster A-P-I need about the infrastructure that you’re bringing in, if you’re going to do that? So if you’re going to bring in your own infrastructure that you are managing through an infrastructure as code tool like Pulumi. The cluster A-P-I has to have some of that information so it knows that it’s not supposed to go out and create new infrastructure.
So what is the information that it needs? I’m speaking specifically here about A-W-S. So for other providers it may vary, but each of the providers is pretty well documented in terms of like if you’re going to do this on Azure, you should be able to check the Azure documentation for the cluster A-P-I provider for Azure and see what information is needed, right? For A-W-S, you have to have the V-P-C I-D, you have to have a list of subnets. Now, there’s two types of subnets. There’s public subnets and private subnets. And cluster A-P-I has a series of checks it uses to determine which is a public subnet which is a private subnet.
You need both and typically a private subnet would be a subnet that has to use a NAT gateway to get to the internet, right? So it’s not exposing public I-P addresses. It’s not using an internet gateway. You have to go through a NAT gateway. So you have to have that list of subnets and cluster A-P-I will prefer the private subnets to place the instances that it’s going to create. So these machines will be on private subnets. They won’t be exposed to the public I-P address and you’ll be able to necessarily like S-S-H them directly. That means typically you’re going to have to have something like an S-S-H bastion host.
If you want to access it to the nodes there are other ways, of course, but an S-S-H bastion host is pretty common. And in that case you’re also going to need a list of additional Security Group I-Ds because cluster A-P-I can create the bastion host for you if you want, but you’re probably going to be co-locating because you’re using existing infrastructure. You’re probably going to be co-locating this kubernetes cluster and a V-P-C or in subnets that may have other things there. And so you may already have an S -S-H bastion host and— associated security groups. So we use this additional— list of additional security group I-Ds to tell cluster A-P-I, put my instances into this security group, so they can receive traffic from the bastion host for example.
Now full details on what is needed to do— bring your own infrastructure, is at the U-R-L here on the bottom of this slide and we’ll give you all the details on what you need to bring and what cluster A-P-I will create on its own right? Basically, what you need to bring is a V-P-C, subnets and those security groups, right, and then cluster A-P-I will create E-O-Bs as needed. It’ll create instances and it’ll create additional security groups that it uses for its own purposes to allow the kubernetes nodes to communicate with each other for example. It also, in that document, outla’s specific requirements for A-W-S tags that are required by the kubernetes A-W-S cloud provider for it to function correctly.
When you do it make sure that your Pulumi code that creates this infrastructure does assign those tags or else the A-W-S cloud provider will fail to function properly and then your cluster A-P-I, your cluster, excuse me, won’t work like you expected to work. Alright. So I’ve laid the groundwork for, you know, sort of what cluster A-P-I is, and how we do bring your own infrastructure, and, you know, have said, yes, you can absolutely do I-A-C with something like Pulumi and use that with cluster A-P-I. Now
I want to show you what that actually looks like. So we’re going to spend the rest of the time in this session probably another 10 to 15 minutes or so, actually looking at this through a set of real world examples, right, of potential ways that you can integrate an infrastructure as code tool like Pulumi with cluster A-P-I. I’ve taken all of the examples that I’m going to show you here. They’re all in a GitHub repository. You can see the U-R-L here. So it’s Github dot com slash ScottSLowe slash 2020 dash C-E-S dash I-A-C dash CAPI . I’m very imaginative when it comes to naming things like this.
So, you know, what we have here is 2020, Cloud Engineering Summit, infrastructure as code, cluster A-P-I, very imaginative. So let’s look at this in my terminal. I’m going to switch over to my demo terminal here. Alright. There we go. And I’ve already shown you what the base looks like, so, I’ll just pull this up again. This is the base configuration we’re going to be using throughout all of the different examples I’m going to show on how you can integrate something like Pulumi with cluster A-P-I and this is a bare bones cluster A-P-I manifest. I created it using the cluster A-P-I tool, cluster C-T-L. So I gave it some information like this is how I created it.
The information on the specific command that I ran for example, is in the README in the— in the repository— the GitHub repository that I just shared. And the first scenario that I’m going to show you is this manual scenario and so in that directory, I have a few files and this README here is where it’ll actually tell you what command I use, like right there, you can see here’s the cluster C-T-L, config cluster, blah blah blah, right? So take a look at that if you’re interested in sort of replicating this on your own. The C-E-S demo YAML, that’s the basic configuration we’re going to use. You’ll see a kustomization YAML here.
We’re going to use kustomizer in one of the later scenarios so for now just ignore it, but I already have this stack and let me see if I can remember, actually look at my history here. Here we go. Okay. So I already have this stack that I am using. I’ve called it CAP-A full B-Y-O-I and it goes through and it creates all of the objects that are necessary to do bring your own infrastructure with cluster A-P-I on A-W-S. So it creates V-P-C, subnets, route tables, gateways, NAT gateways, all that kind of jazz, and then it exports these fields so that we can use them later on. And when it comes to integrating infrastructure as code, like what we’re doing here with this Pulumi stack, and cluster A-P-I, you could do it manually and you could use it tool like Y-Q or whatever.
Now I have an example of using Y-Q to pull this information out, right, and I’ve put them all into a script just for sort of ease of use. So let’s take a look at that. So I have some variables at the top that just make it easier, later on, you can change these and the README has information about what needs to be changed ff you want to replicate this on your own. Keep in mind. This is a total hack. I wouldn’t recommend this for like, you know, real production sort of use, but it will work, and it will give you an idea of one way that you could integrate these two if you’re interested. And so I’m using this tool Y-Q.
There’s a link to that and it’s GitHub repository in the README, but what I do is I make a copy of the original and then I write these additional fields that are necessary for cluster A-P-I. One of these is this network spec dot V-P-C dot I-D and then I use the Pulumi stack output command to reach into my project, into my stack and pull out the V-P-C field that I exported it in my code. And then I do the same thing for the public subnets and then the private subnets or vice-versa actually, sorry, private subnet and public subnets. I tried to use a bash for loop here, but I kept getting errors. So I just hard-coded it, again this an example.
It’s a hack right? And what this will do is it will go through and write all the necessary fields that are needed for cluster A-P-I to use the existing infrastructure of the existing V-P-C and the existing subnets. This example does not write any additional security groups in there. I’ll do that in another example. So if I run this, It’ll take a minute or two to run while it goes and reaches into the stack and gets information out. And then now I see I have a new file called modified, and if I look at modified at first, it looks like it’s normal. But then when we get into the A-W-S cluster object here, so the second document, second YAML document in this file, you can see that this network spec V-P-C and subnets is added there.
And if I get back to the C-E-S demo the original and I look there they didn’t exist before. Right. So what we’ve done is we’ve modified the base configuration so that it has the information we need to use existing infrastructure. And if I were to apply this manifest against my cluster A-P-I management cluster, which is where the cluster A-P-I controllers and kustom resource definitions live, then it would go and it would create a new what we call a workload cluster— cluster that it’s managing via cluster A-P-I and it would do so in the specified V-P-C and in the specified subnets, okay? There’s additional stuff by the way.
If you need to— if you want to distribute your cluster across multiple availability zones, across multiple A-Zs, there’s some additionalfFields you have to add, the control plane will do that automatically. So if we were to apply this then we would see the control plane if it’s multiple instances in the control plane. It would automatically distribute them across A-Zs. Worker nodes are a little different. All this is in the documentation upstream, the link that I gave you earlier and I’ll show it on the screen again towards the end of the presentation. So you have it. And then I could just use Kube C-T-L to then apply this this manifest right? I could just Kube C-T-L apply dash F.
I’m not going to do that just yet because I have another way of doing that that I want to show you, but this gives you one— an idea of one way you could— you could do this right? Now, the other way, one of the other ways, is here. And in this case, what I’m doing is I’m using a Go template that I created from the base configuration. And that’s this C-E-S demo dot T-M-P-L and then I have some Go code that I wrote and that Go code will generate that template using information from the stack. And to pull the information from the stack, I’m using Pulumi’s automation A-P-I.
So let’s take a look at first through the template so— so this is a pretty standard, you know, Go template. At first you don’t see anything here, but then it’s all standard right? But then we get into the network spec stuff, which is where we need it. So you see I have a reference to a V-P-C I-D and then arrange object for some subnet I-Ds. And then farther down here, you’ll see me use a field called hack, I’ll explain what that is in just a moment. There’s probably a better work-around than what I’m doing here.
But this is what I had to do for now to make it work. And then farther down I’ll show you this is where we add the additional security groups. So under this A-W-S machine template, we have the spec, the template, the spec and then additional security groups and we would have a list there of any additional security groups that we needed to add. This case is only one and that’s going to be the bastion security group that will allow it to communicate with the S-S-H bastion. Now let’s look at— look at the Go code that I wrote. Again be gentle. I am a newbie programmer.
So first I’ll define a struct that has the fields that I’m going to need and I’ll— there’s that hack field. I’ll come back to that in just a moment. I use the automation A-P-I to reach into my stack. I then pull out some values that I need. So the V-P-C I-D, the bastion security group, the public subnets, the private subnets, put those in a combined field and then down here the hack field. There’s a subsequent round of Go templating that cluster A-P-I does when it uses the template to create the cluster.
And so what I did here, because I kept getting errors in my Go templating, that it didn’t understand what, you know, D-S metadata was, because I don’t— I’m not passing that data to it. So instead I replace my temporary field with the ultimate field that cluster A-P-I will use. And so where in the template it sees hack then when I make my templating round it will then substitute D-S metadata local host name which is what cluster A-P-I will use and require. So then the rest of the code is all straightforward. It just generates the template and off it goes So let’s let’s do this.
We know there’s no there’s no YAML file here, just the template, right, we’re going to do a go run, main dot go, this will execute the code and it will use the automation A-P-I to reach into my C-E-S demo stack, pull out the values it needs, and then use Go templating to generate a template. And so when I look at here, now,I have YAML file based on that template and if I look at that YAML file, whoops helps to type, then we see YAML here. And we see that we have the V-P-C spec and the subnets listed and if I go farther down here, under this configuration where you see name D-S metadata, that’s my hack, right, it replaced my hack field with a proper field that the cluster A-P-I round of Go templating will look for.
And then down here in the A-W-S machine template you can see that it populated the security group I-D that it needs to communicate with pre-existing S-S-H bastion, and so now I could again use kube C-T-L apply dash F, this YAML file against my management cluster and it would go in to create that. But I’m not going to do it yet, because I have one more thing I want to show you. So let’s go here. Okay. So in this last example that I want to show you, I have built on the previous example, I’m still using the automation A-P-I. In this case I have another local project that’s stored in this case directory and it uses the kubernetes provider and its built-in kustomized support and what I’m doing here is I’m templating out some kustomized overlays that will receive the values from the Pulumi stack.
Then the kubernetes provider will use kustomize to apply this— those overlays against the base configuration and that’s defined in the kustomization dot YAML file and that’s why we had a kustomization dot YAML file in the manual directory because that’s the base configuration that it’s going to be applying against. And then the kubernetes provider will automatically apply that against my management cluster. So first, I’m going to my S-S-H channel’s probably timed out. So I’m just going to make sure that yep, okay, so let’s re-establish that. Okay. There we go. Now you’ll see that I’m talking to my management cluster. I told it to do get clusters, this— the fact that it doesn’t find anything just shows that there are no workload clusters to find in my management cluster.
So it doesn’t have any clusters to find. And after I run this code then we’re going to do this again, and we’re going to see a cluster there, which means that it has successfully generated the code and is looking at, and is created in the cluster using the information from Pulumi— the Pulumi stack to populate an existing V-P-C and existing subnets. But before we do that, let’s look at the code. So first, the main code here, this is just an iteration from the previous one. So I still have that same struct because I’m using a template. And then I reference my original stack my CAP-A full B-Y-O-I stack, which generates all the underlying infrastructure, pull the values out that I need, just like I did before, then I define my templates.
These are what will become the kustomized overlays and then it iterates over that list and generates the templates and then uses the automation A-P-I to drive that K8s stack which uses the kustomized— or the kubernetes provider and its kustomized support to automatically apply this against my management cluster. So let’s look at that. So this is a pretty traditional looking set of Pulumi code. I left some comments in there in case you want to try this, but you don’t actually want to apply it, you can un-comment that provider field there and the render YAML to directory field and then change down here on line 21 that you want to use that other provider. That will just generate a set— of YAML against a directory, the rendered directory on your local file system.
So you can see what kustomize is doing. I use that to test, but you might also want to use it just to see how it works before you actually try to apply it against an actual management cluster. So we’ve got our high-level Go program, which is using automation A-P-I . It’s going to generate kustomize overlays and then drive the K8s project to actually apply those through kustomize. So let’s do a go run main dot go. The first part of this will run, it’ll take a couple of minutes, or about a minute, whatever, and you won’t see the output and this point it’s going and it’s generating the kustomize overlays and getting all that prepped, and then in a moment, we’ll see the Pulumi progress streamer pop up. There we go.
And at this point it’s going to give us updates. So here it’s actually running the kustomize overlays against the base configuration and then automatically applying them against my management cluster. And it’ll take a minute or so and we’ll start seeing some objects populating here. Here we go. And what we’ll see, went it’s populating objects is we’ll see it populating cluster A-P-I object. So you’ll see a cluster object, an A-W-S cluster object, machine templates, you know, control planes, blah blah blah. Alright, so it creates all those— those items says, okay, I’m done.
And if I now do my kube C-T-L, get clusters against my management cluster, bam, we have a cluster actually provisioning and you’ll have to take my word for it that it’s actually going into the V-P-C and subnets that we specified and not creating a new set of resources, right? So we are using Pulumi to create our base infrastructure and then driving— pulling that information out from that base infrastructure to give it to cluster A-P-I so that it can leverage that. Now you could obviously take this even farther with the automation A-P-I and you could— you could write a high-level Go program that runs the initial stack. So I ran the— or I created the C-E-S demo stack myself, right, but you could have it run the C-E-S demo stack then pull the values out and then run the K8s thing and it would be completely automated so you wouldn’t have to do anything.
You would just run the Go program and it would create the base infrastructure, and then you could paramertize it so that you could just reuse that over and over again, right? That would be cool. I haven’t gotten all the way there yet, but the the skeleton of what you see here should give you an idea of like what that would look like. And Evan Boyle, his examples on the automation A-P-I are on GitHub. I didn’t include a link to that in the presentation, but definitely look at those if you’re interested in using the automation A-P-I . Alright, I’ll switch back to the slide deck now.
Okay, so that’s just a quick demo of you some of the ways that you could integrate your infrastructure as code solution using Pulumi with cluster A-P-I. And so you’ve obviously adapt those or use those as a springboard to come up with other ideas or whatever. Just wrapping up then. Here’s a list of all the resources that I’ve referenced in the presentation. So all the links gathered together in one place. I also added links to my site where I’ve done some articles on cluster A-P-I and Pulumi so if you follow those links, you’ll get a tags page that just shows all the articles that are tagged with that particular tag.
All the cluster A-P-I articles or all the Pulumi articles. Whoops. There we go. Okay. So thanks for watching. I hope that the session was useful. If you’re interested in getting in touch with me online I’m @Scott_Lowe on Twitter. Feel free to reach out. Also ScottSLowe on GitHub. Remember I am a new programmer, so don’t expect to find anything like, you know, Earth-shattering there, but I’m always looking for new resources. So if you are a more experienced programmer, and you have some resources you think I should take a look at then feel free to reach out to me or whatever. I’d love to hear from you. So thanks so much.