My Podcast Playlist for 2017

I have a lengthy train commute to work, so podcasts are a lifeline for me on several levels. They give me something to fill the time. But much more importantly, they keep me fresh with what’s going on in technology. If you’re looking for something to put in your ear this year, consider this list. I personally listen to and vouch for every one of them.

Datanauts, from the Packet Pushers network. Their stated mission is to “explore the latest data center innovations including storage, virtualization, networking and convergence”, as well as “bust silos”. And they do a pretty damn good job of it. The hosts are a CCIE and VCDX respectively, so they know their stuff. And yet they are also adept at getting out of the way and letting their equally interesting guests come on and do their thing. A recent episode with Charity Majors was particularly fascinating to me, and I have a ton of tabs open for followup reading. This might actually be my favorite tech podcast right now.

Speaking of the Packet Pushers, I also really enjoy their Network Break podcast. It’s a quick 30 minute hit of the week’s highlights in networking news. Product announcements, trends, acquisitions, etc. Superficial and mostly on the business side. Just the right level for anyone who doesn’t do networking full time but wants to keep up.

Software Engineering Radio is hit or miss for an Ops person. Sometimes it’s geeking out over the best features in the new C++ standard. Other times it’s solid gold discussion of things like salary negotiation, or Apache Spark, or some other new tech you’re going to have to support in production tomorrow. Their guests tend to be The Authority on whatever the subject is (like the inventor of PowerShell or Golang). So I subscribe to the feed, aggressively skip topics, and then listen with rapt attention when something good comes along because they are probably talking to the world’s foremost authority on it.

If you ever touch a Microsoft technology, RunAs Radio should be your very first stop for news. Host Richard Campbell is very plugged into that world, and the caliber of guests he gets every week reflects that.

Arrested DevOps is a great show on the eponymous topic of DevOps. There have been many podcasts in this space, but ADO is one of the last ones standing. And still one of the best. As “DevOps” is a broad and loaded term, the show covers a ton of different topics. Take a look at the episode backlog and see if a few tickle your fancy. And when you’re done with those, listen to the rest anyway!

Finally, I’ll throw out Software Defined Talk. Hosted by the one-and-only Michael Coté with a couple other dudes, it’s a hilarious roundtable of tech news and their takes on it. Plus useful recommendations on Costco deals. Just listen already. It’s highly informative, witty, and far better than I can make it sound.

I follow a few other shows, mostly hoping they return from limbo and post a new episode. I could do a whole other post of dead podcasts whose back catalog is must-hear stuff (RIP The Ship Show, and DevOps Cafe is an all-time-great but on life support). But the above shows are my weekly mandatory listening going forward.

How about you? What podcasts am I missing?

Nginx Load Balancer Improvements to proxy_next_upstream

This change happened in March of 2016, but was still news to me when I stumbled across it recently. So I wanted to share since it’s important but didn’t seem to be loudly broadcast. Nginx is no longer dangerously bad at load balancing!

Among the many features of the outstanding Nginx webserver is the ability to act as a load balancer. Using the built-in upstream module, you can define a pool of backend app servers that should take turns servicing requests. And in theory, you can tell nginx to skip a server if it is down or returning error (HTTP 5xx) responses.

In practice, however, Nginx’s handling of downed servers can be very dangerous. That Hacker News thread notes that when a server returns an error, Nginx will by default always retry the request on a second server. This is fine most of the time. But what if the request was “charge $10,000 to my credit card”? Maybe the server correctly applied the charge, but then failed while rendering the confirmation page and returned an error. Well, get ready for some real angry customer support calls. Nginx would have resubmitted that same $10,000 charge over and over until a server responded with an HTTP 200 OK.

For this reason, many admins recommend setting the value proxy_next_upstream off;. This makes a failed backend request simply return an error page instead of retrying it on another server. Definitely not ideal; who wants their users to see error pages? But better than handling a deluge of chargebacks from outraged customers who were billed multiple times. In reality, this often meant admins chose another, specialized tool for their load balancing needs, like HAProxy or an expensive hardware appliance from the likes of F5 or A10.

But wait! With the release of Nginx 1.9.13, things got better. Nginx will now never retry “non-idempotent” requests unless you explicitly tell it to. Idempotent means that no matter how many times you perform an action, it always has the same result. So this excludes POSTs, and a few more obscure methods.

So if you’re still running with proxy_next_upstream off; in your config because of those concerns, it’s time to test removing it. Nginx’s load balancing is much safer and saner than it was this time last year.

Rundeck Performance Tuning With MySQL

At my current job, we use a tool called Rundeck to automate a slew of tasks. I initially stood up a test instance on a small VM, so people could kick the tires and decide if it was useful. Before I knew it, five or six dev teams were running dozens of critical jobs out of there, raving about its power, flexibility, and visibility. It had been voted into production whether I liked it or not.

There was just one problem: Web performance was awful. I’m talking “click a link, go get a fresh cup of coffee, come back, and the page is still loading” slow. Literally. It had started out fine, but as popularity grew, it quickly became unusable.

Hours of fruitless troubleshooting later, I came across a GitHub issue mentioning some missing indices on MySQL tables. Hmm, we use MySQL… searching for that table name led to a couple more issues and mailing list threads homing in on one issue: Add these indices, and perf is fixed. I gave it a shot.

Backup your database before running these commands. They should be harmless, but better safe than sorry.

ALTER TABLE workflow_workflow_step ADD INDEX workflow_commands_id ( workflow_commands_id );
ALTER TABLE workflow_workflow_step ADD INDEX commands_idx ( commands_idx );

Performance immediately improved for us. Load times on problem pages went from minutes to a second or less. Again, literally.

Why these indices are not created by default, I couldn’t say. For all I know, there’s a great reason, but it’s never been articulated by the Rundeck developers. There are (ignored) GitHub issues and mailing list threads mentioning them dating back years. Which is a shame, because Rundeck is a great tool. But it borders on unusable until you fix this problem once you get beyond a handful of frequently-running jobs.

That is also unfortunately a metaphor for Rundeck itself. Hard to find–yet crucial–tweaks and confusing interfaces challenge operators and users at every turn. I think it’s an A+ idea that suffers from some unfortunate C- documentation, design, and operational decisions. It’s an outstanding tool for fostering a DevOps culture, which in a beautifully ironic twist reminds us of software a dev team threw over the wall to ops with no care in the world for those who had to operate or use it in production.

End rant. As I said, we really do use Rundeck extensively despite all that. I’ll try to post some more real-world stories of use cases, tweaks, and gotchas on this blog in the future.