The essence of DevOps

Introduction

The first time I heard the word "devops" was late 2010. In that time I was working in a research project in my university and I couldn't really understand what did that mean. Next year I started working in a small company here in Rio de Janeiro. We had a small team and our most experienced developers were responsible for deploying out application. Our team started growing and we thought that would be better to have someone with expertise in system operations to handle everything. That was a terrible idea... In this post I would like to talk about some of my thoughts about the devops and how I believe that this has a lot in common with agile software development.

About DevOps

I did some research to find when the development community started talking about devops. In an article from the 2008 Agile Conference written by Patrick Debois (@patrickdebois) there are some insights about how their infrastructure team did some experiments thinking about their applications as clients of their infrastructure.

Each application was seen as a customer for the datacenter. This way, they started to compile their backlog, with different priorities assigned.

Until then there was a big separation between the developer and operations/infrastructure team. In Patrick's article we can see the beginning of a new mindset.

Digging deeper I found an presentation for the 2009 Velocity Conference written by John Allspaw (@allspaw) and Paul Hammond (@phammond). At the time they were working at Flickr and their groundbreaking point was that they were doing at least 10 deploys per day of their site.

Try to imagine the repercussion of that. Until there the deploy was like an special event. It's the moment when developers go out of their sanctuary and go down to the operations center to see their creation being brought alive by the hands of the operation guys whom can't understand how precious the product is.

"Ops who think like devs. Devs who think like ops."

What they presented was a new way of seeing things. The developer job isn't anymore just to add new features to the system and the operations job isn't anymore just to keep the system up and running. Both the developers and operations have the same goal: enable the business.

Isn't that what agile software development say? That people should collaborate in order to enable the business and respond to changes in a way that the customer is always satisfied? Why the operation guy has been left out of this?

What I learned

Coming back to my story about how bad was our idea of hiring someone just to take care of our deployments and production environment. The problem here isn't have someone to execute the job. The problem was the mindset of the team about him and his mindset about the product.

From the developers team point of view we had an operations guy that must do everything to keep our applications running. We didn't care if our logging is bad, if we didn't provide any monitoring information or if our deploy process was exhaustive and error prone. All we wanted from him is that the keeps the application running well.

From the operation guy point of view we were a bunch of suckers always changing something and who doesn't care about the stability and performance of the application.

See? We created the wall that have been torn down in other companies. Our previous commitment to delivery the product to the client changed to delivery it to the operation guy. And the commitment of the operation guy is to deploy the application and delivery to the client the developers product, not his.

I don't have to say that this didn't work very well. I don't have any statistics about how this affected the quality of our product but I can say that we have lowered our product quality. We stopped worrying about expensive queries, memory leaks and everything, after all those are concerns of the operations guy. Let him suffer and create a database index or let him be woken up in the middle of the night to restart the server, it's not our job as developers. Even our personal relationship with them have been affected. We didn't feel as the same team.

After a lot of meetings and planning we decided to stop using the operations guy. We started again doing our deploy and keeping our applications running. I can't say that it was a bad idea but I believe it wasn't the smartest one. After all, why can't we use all the knowledge and experience with servers and networks of the operations guy in our benefit?

Today when I look back and think about it I believe that we should've brought the operations guy to our team. We could have taught him more about our product and let him suggest how we can optimize queries, avoid memory leaks and how to enhance our deployment process using automation processes. After that I believe that he would have felt more committed with the product and we would have started to treat him as one of ours.

Conclusion

How does everything I said fit inside agile software development? Does every team should have an operation guy inside?

Well, I can't say that having an operations guy inside the team would be a bad thing but we know that sometimes it is not possible due to company budget or culture. What I believe is that you as a developer should put yourself in the place of the operation guy. Don't think about him as someone that is trying to stop you from being innovative but think how you can make his life better.
And you as an operations guy should try to understand better about the products that you work with and try to find ways to make it a better product.

In their presentation, John and Paul shared some tips of how they applied those concepts at Flickr. The tools they used and the culture changes they made. I'd like to highlight some of them here but I really recommend that you check their presentation.

Tools

  • Automated infrastructure: Continuous integration, application containers, cloud infrastructure... There is so much to talk on this topic that it would become a post inside a post. The key is to automate everything you can to avoid mistakes. Remember that as humans we are error prone;

  • One step build and deploy: This can be an special topic inside the former. I just wanted to give to it special attention;

  • Shared metrics: Not just the operation guy should know how the application in behaving. Everyone in the team should be able to see it anytime (I would say that even the client should be able to see it whenever he wants, but I know that sometimes it is not possible);

Culture

  • Respect: respect other people’s expertise, opinions and responsibilities. Remember that everyone in the team wants the same: create something awesome.

  • Visibility: Don't hide things! Talk openly with potential problems and try to build contingence plans together. Everyone screw up sometimes and it is better to find it as fast as possible to fix or contain it.

  • Trust: Everyone needs to trust that everyone else is doing their best for the business. Involve operation guys in features discussions and developers in infrastructure discussions and let them have access to the systems. You're all in the same team!

  • Healthy attitude about failure: Failures will happen and you can't waste time looking for the culprit. It's better to waste time thinking about what to do when something wrong happens. "If you think you can prevent failure then you aren’t developing your ability to respond";

I believe that the most important thing is: remember that doesn't matter what do you do in the product but that you are as responsible for it as anyone else. Commit yourself with the product and help each other to achieve better quality software.

References