Containers, Docker, and Kubernetes 101: A Conversation with Ryan Kenney

interview

February 19, 2020

Summary

Ryan Kenney, senior consultant at Coveros, chats with TechWell community manager Owen Gotimer about the difference between containers, container engines, and container orchestration; using containers in your CI/CD pipelines; and the cost of security.

Ryan Kenney 2:44

A good way to understand containers is to compare them to something you might already be familiar with, such as virtual machines and virtualization. So whereas with virtualization your abstracting away the hardware, you're encapsulating the OS. With containers, it's just the next layer up. You're virtualizing the process itself within the OS. So with virtual machines, you have multiple virtual machines running on the same hardware, but they think they have different hardware. You can have multiple containers running in one OS, but they have different views of what the kernel is. The kernel being the core of the OS, if you're not familiar. So in a nutshell, that's kind of what a container is.

Owen Gotimer 4:08

I think some things that people would be interested from a high level, if they're just learning about containers, or if they're just learning about this space in general, is what benefits do containers offer teams as they're trying to build their software?

Ryan Kenney 4:26

It's a probably speed and resource utilization. So I think you've sort of got resource utilization with virtualization: it was easy to spin up multiple machines with some set of hardware. With containerization, you can have a much more fine grained management of your system resources. It takes a lot longer to spin up a VM than it does to spin up a container. It's less than 50 millisecond just to spin up a container. It's not that fast for virtual machines. So, I mean, there's an immediate speed benefit there.

Owen Gotimer 5:24

Right, but containers don't solve the problem for everyone? Just because they provide the benefit of speed and resource utilization doesn't mean that just because those are the two problems you're facing that necessarily containers are the right answer for you.

Ryan Kenney 5:44

Yeah, or that you'll get those things from it. So there's the concept that your application has to be easy to containerize in order for you to get some of those benefits. So for example, if your application can't scale or doesn't work when scaled, then if you were to try, say putting your application behind a load balancer, you don't get any benefits from it, because a load balancer works by scaling and shifting traffic to multiple copies. But if your application doesn't support it, you're not going to get any benefit. So the same is true with containers. You have to be ready to to put your application in a container. One of the big ways to do that is to make sure your your application as it is written isn't highly tethered to its deployment environment. So if it's very aware of its surroundings and needs very specific things to be in certain places, then you're not going to get any of the benefits of containerization. Microservices, that's a buzzword people like to use, they tend to fit the paradigm a little bit better because containers are easily throw away. They're disposable. You just have a bunch of them running. You treat your containers like cattle, not like pets. So none of them are special. They're just all blind copies. Whereas if your application is not made to run that way, you're not doing microservices, you have a very particular set of circumstances that make your application work, then you won't get any benefits.

Owen Gotimer 7:33

So obviously one of the challenges of containers is that they can't be tethered to specific environments that you may be previously built your legacy application on. One of the other challenges is that technical debt piece is making sure that the the application itself is ready to be containerized. What are some other challenges if there are any that you have seen while transitioning to using containers?

Ryan Kenney 8:04

One of the issues with containers is if they're not using a tool to orchestrate them. So if you're trying to manage your containers manually, and you're trying to run something more complicated than hello world, then you're probably going to run into issues. A lot of people are trying to start with the orchestration tool right off the bat, which poses its own set of challenges, but that at least solves that problem. But if you look back when Docker was just sort of becoming becoming popular, one of the problems we would see was people wanted to use this cool technology, and they didn't know how to scale it.

So you mentioned a couple of tools there. I want to start with the latter, which is Docker. What is Docker and how is it different than a container and how is it different from what we'll talk about in a minute with container orchestration?

It's interesting what people mean when they say Docker. So Docker is a container engine. So if you were to go to the internet in and say I want to download Docker, what you're downloading is the Docker container engine. And there are several competitors to that, but as far as I know, they're still not as popular. I don't think they have as much market share as Docker does. Docker runs containers, but a lot of times people just say when they mean containers, generally they'll just say Docker containers and it doesn't really make a difference. Colloquially it doesn't. I mean, there can there can certainly be issues like if you are running something else that can start to become an issue. But again, since most people are running Docker, people just use the terms Docker and containers interchangeably. So that might be where the confusion was.

Owen Gotimer 11:06

The next step beyond Docker, so you have the containers, you have the container engine being Docker running those containers, then you have the what you mentioned as container orchestration. The tool you mentioned was Kubernetes. That's not the only tool out there, but that's one of the more popular ones right now.

Ryan Kenney 11:31

So with Docker, since that's what we've been talking about, you can easily spin up containers. You say "docker run", and you give it an image—that's like a container template. Then you create an instance of a container, and then if you want to do any more configuration, you have to specify all this stuff. Kubernetes is a clustered application. So it's basically a bunch of machines, all running Docker, managed through a single interface. And so that's like the Kubernetes API. So you interact with this API, and then you can spin up containers to say I want to have 10 copies of my application because my application is sleek and robust, and I can load balance it. Well, let's say I've got a cluster with five machines, all running Docker. I could spin up 10 copies of it, and they can all communicate effectively, and they can all do what they need to do. And that's all managed by Kubernetes. So I didn't have to do any networking. I didn't have to do any of the logic for load balancing. I didn't have to do any of that, because they're all Kubernetes constructs. It's all based on how Kubernetes itself runs. So what I did was I said, this is the Docker image I want, this is how many copies of it I want, and if I wanted to, I could say only run it on these nodes or only do this other stuff, but it has stuff that can do by default, and I don't have to be overly explicit to get something up and running.

Owen Gotimer 14:00

So Kubernetes really helps facilitate the managing of the run of your containers and helps with the scalability of being able to run these applications across a load of different containers. What are some of the challenges that you face in using container orchestration tools like Kubernetes, or Red Hat's OpenShift?

Ryan Kenney 14:30

So it always depends on which variant you're using. So for Kubernetes, I'd say probably the biggest challenge is security. Kubernetes is insecure by default, which is to not to say that Kubernetes can't be secured, or that you have to bend over backwards in order to in order to lock it down. But in order to drive adoption, basically they made it insecure. So it was easy to get something up and running, so it's easy to get started. But it takes a little bit of knowledge and understanding in order to get it fully secured, which I think that's a good model. We talked about Red Hat's OpenShift. So they took a different approach. Their model is it's more secure by default, and part of the reason for that is they're not trying to drive widespread adoption utilization of the tool, like Google is doing for Kubernetes, but what they want instead is for you to pay for their support. That's, their model. If you're getting OpenShift a lot of times you get a Red Hat team to come set it up and help teach your people and help manage it for you. So that's kind of what they want. But security still comes at a cost. So no matter how you choose to address it, security is going to be a thing you have to worry about.

Owen Gotimer 16:01

Security is such a hot button topic right now as well. What are some steps that people can take as they start to use the containerization tools to make their applications and their containerization itself more secure?

Ryan Kenney 16:18

So you have to know the tools that are in use. A lot of problems can't be solved without the help of tools, so knowing where to get started is probably a good one. Looking up security tools that are sort of more container friendly and are built to handle this. Just off the top of my head, Twistlock is a good one. They're kind of an all-in-one. They do monitoring. They do host scanning, so they scan your Kubernetes nodes. They also scan running containers on your cluster. They also have a static analysis tool that can scan your images. That's less useful for the cluster, and it's more useful for the applications you're building that you want to run on the cluster. I don't know if they're direct competitors, just because they're not kind of the same all-in-one suite that there's like Sysdig, Falco, Aqua. If you just Google those words and Kubernetes, you'll be on the right track. People have written plenty of stuff out there about comparing the tools and pricing and stuff like that.

Owen Gotimer 17:44

As teams start to use the containerization, Kubernetes, Docker, etc., a lot of these teams are using it to help facilitate their CI/CD pipelines. When we're building those CI/CD pipelines to help facilitate our applications and whatnot, security obviously plays a big role in that. How can Kubernetes and Docker and containerization, in general, help facilitate your CI/CD pipeline?

Ryan Kenney 18:22

When you say you want to use containers, and Kubernetes, and stuff like that, there's two ways you can use it. We talked about the first, which is using it for your application, and so as far as how that affects the pipeline, you're building a different artifact, whatever CI/CD tool you're using needs to know how to build Docker images. You have to have a place to store them.

But probably more interestingly, is using those technologies in the pipeline itself. So a pipeline is just going from source code to deployment with quality gates in between and doing a whole bunch of cool stuff: they're running tests and whatnot. But it's all just executing stuff. So I have a task that runs my static analysis, I have a task that runs my unit test. And that's all just code being executed somewhere. In Jenkins, it's something that's getting run on it, what's called a Jenkins agent. It's like doing the actual work. So when you talk about using Kubernetes and Docker containers in your CI/CD pipeline, what we're really talking about doing is taking all that executable work and offloading them into containers. And why is that important? So that can be helpful in that if you've ever managed Jenkins or had to administrate Jenkins, your agents end up getting fairly loaded down with all the various tools that developers want and need. You generally have developers that have to ask an operations person to install some stuff just because you can't trust the developer to install stuff here, because again, these are static agents, if they break something, then you know someone in ops has to go and fix it. Then you have the silo with Kubernetes. It's interesting because Jenkins integrates nicely with Kubernetes, and what you can do is you have Jenkins Master able to talk to a Kubernetes cluster and spin up containers on demand. To run tasks in your CI/CD pipeline, all you have to do is provide the Docker image—remember, that was a template for running a container. So this actually allows developers to work more closely in the ops space, and that they can define the environment in which they want the pipeline to run, or at least a specific task to run. So they're no longer just making some change to the application throwing it over the wall saying it's ops' problem. They can be involved in that process as well. So it's better use of your resources like we talked about before, but it's also I think culturally, it's breaking down the barriers a little nicer. Again, a tool won't force you to do that. But in this case, it certainly doesn't hurt.

Owen Gotimer 21:08

So it really helps connect the dev and the ops to form that DevOps, and one of the buzzwords that we're hearing right now is DevSecOps. You touched on security and how Kubernetes has some security vulnerabilities and Red Hat is built more securely, but maybe a little bit more difficult to manage in house. You mentioned a couple security tools. Can those security tools flow right into that CI/CD pipeline being orchestrated by your containerization tools?

Ryan Kenney 21:41

Theoretically, any tool that you're talking about. You don't have to have an application that runs as a Docker container in order to use Docker containers in your CI/CD pipeline. I think it would help or rather it's a good direction to go, but you can still get value in doing so. While using a newer tool like Twistlock to do your container scanning, you can integrate that right into your pipeline. But you can also integrate any other security scanning tools that you were using as well. The key thing is just that you have a Docker image that's taking some inputs, like your application source code or your application binary, and then running the tool on it and then producing results.

Owen Gotimer 22:26

I heard a stat. I don't want to quote the stat because I don't remember it specifically. But it was some ridiculous amount of companies that don't even do like the bare minimum security scanning on their application. It's like 95% don't do even the most basic security scanning. So I think even the idea of being able to kind of integrate these security tools into a pipeline that is running in the virtual space that is orchestrated by Kubernetes, so you don't have to remember to click that button every time, and it's just part of the pipeline is going to help a lot of companies out.

Ryan Kenney 23:03

Well, the flip side of that is there are places that are running security scanning tools. Running it. That's it. They don't look at the results. Or if they do they don't really act on it or do anything. Sometimes there are good reasons for like, "Hey, there is this vulnerability, but, you know, we've done some analysis and we don't think this should break the pipeline." Okay, you looked at it, that's why there are less savory cases where it's like, we really need to go to production. This is a legit issue, but we're probably going to fix it before it becomes a problem. And then there are places that are just "Yeah, I don't care just release it." For whatever reason, they see the results are there, they're just choosing to ignore it. So some places, recognize what they're doing is bad. Other places think, "hey, we're running the security scanning tools because that's how my job performance is measured is what tools we're running, not what we're doing about it, so we're okay by me."

Owen Gotimer 24:15

It all comes down to what the business is trying to accomplish, what problems they have, and what challenges and risks are they willing to take. It's important that we provide developers and testers as people in the development lifecycle with the information to make the decisions based on security vulnerabilities that we found are security risks that we've kind of highlighted for them. For people who are new to this space, and maybe are kind of getting their first glimpse really into containers and Docker and Kubernetes and all of these different tools, what are some steps and some resources they can take to get started?

Ryan Kenney 25:03

So there's, there are a few nice answers and a few not nice ones. The short version is you're probably going to have to spend money. But there are there are free resources out there. The Kubernetes documentation is pretty good. There's the Linux Foundation. I believe they have a free intro to Kubernetes course. There's one you pay for that's meant to be a 40 hour course, and then there's a free one but it's like a couple hours or something like it's just to give you a taste. They produce good quality stuff.

Shameless plug: we do training on Docker and Kubernetes, and also we do coaching and helping get stuff set up. So you know there's external consulting companies. Again, the best options are probably going to involve spending money for your organization. If you're a highly motivated individual and you're looking to learn, where there's a will, there's a way. There is free documentation out there. Kelsey Hightower does Kubernetes the Hard Way. It's a get repo that has you install Kubernetes cluster, but using zero automation, so you really learn the ins and outs of it. So there's lots of cool stuff out there and that's free but it takes a lot more work. If you're a manager and you're looking to get a team onboarded, you're probably better off with one of these courses or something just because it's less work, you know what you're getting, and how you're doing it.

Topics:

agile build automation configuration management continuous integration deployment devops release automation security

About The Author