I have been building a security engineering team for the past 9 months. Here are a few things I learned.

Photo by Matthijs van Heerikhuize on Unsplash

It is going to be challenging.

First, let us acknowledge that security recruiting is hard. It is very very hard, and is possibly the hardest kind of engineering recruiting out there. When I realized that I was struggling to get good candidates in the door, I reached out to friends, former colleagues, and other directors and VPs in my network, hoping to learn from their wisdom. Their first responses were the same:

Having worked in a few startups that hired rapidly, I have noticed that there is always this first moment when someone would exclaim that some department can no longer fit in the largest conference room. It would be said aloud, sometimes with a subtle chuckle, but always with some pride in the acknowledgment that “this is how far we have come everybody but never forget I am Employee #17.”

This is how far we have come everybody but never forget I am Employee #17. — Employee #17

(I must admit I was that guy once.)

I have also noticed that…

Photo by Megan Markham on Unsplash —Wear all the hats!

When I first dipped my toes into management, my company and I worked out an arrangement where I was effectively both a tech lead and a manager. (This is also known as a “TLM”). It was a great way to transition myself into a new management role. In hindsight, I stayed in this dual role for too long (years), would not recommend it for first-timers. I now believe that with proper coaching and guidance, the transition period for new managers should be no longer than 2 quarters.

Taking on new roles demands a great deal of attention to identify the…

Co-author: Anuj Desai

Traditional backlog management — Photo by Kelly Sikkema on Unsplash

Anuj and I have had the privilege of working with several teams on their roadmaps and sprint plans over the last few years, and we noticed that it is not always obvious what a healthy backlog should look like, or how it can be maintained. Unhealthy backlogs, on the other hand, are much more apparent.

Let’s declare <X> bankruptcy.

- Each of us, at some point in our careers, where X is often email, backlog, and sometimes PagerDuty.

Too Long

The foremost example of an unhealthy backlog is one that grows in length uncontrollably. Week after week, more than…

If only career ladders were upright and obvious — Photo by Max Ostrozhinskiy on Unsplash

Promotions and compensation adjustments are some of the most important functions of management. By electing to recognize (or not recognize) individuals and their contributions, we construct a system of incentives that encourage and reward certain behaviors. This shapes the culture of the company and is an important aspect of talent development.

Am I going to get promoted this quarter? — Everyone.

When adjusting the dual levers of promotions and raises, we want to:

  1. identify role models for other aspiring individuals.

Together, these create virtuous cycles of learning…

Don’t block the loop. Photo by Chris Arock on Unsplash

This is a short explainer of event-driven servers, intended to help readers gain an intuitive understanding of event loops. It could be useful when:

  • comparing Apache HTTP Server and NGINX
  • choosing concurrency models for gunicorn
  • troubleshooting an event loop

Concurrent Servers

In the classic client-server architecture, a server accepts connections from clients, receives data on the new socket, forwards it to the application for processing, and then sends data back to the client on the same socket.

Not everyone lives near AWS us-east-1. Photo by Anastasia Dulgier on Unsplash

At the beginning of 2019, Engineering@Affirm set aggressive performance goals for our react apps and affirm.js¹ to improve user experience. To drive this effort, we started out by improving instrumentation, and measured, in granular detail, the performance of each of our apps and of affirm.js. Shortly after, we coordinated a concerted effort across the organization and prioritized optimization projects across engineering teams, which included code-splitting and CDN improvements.

The common web page optimizations are well covered by several other articles. …

How much are you learning from your postmortems?

The fire that never was. Photo by Piotr Chrobot on Unsplash

Young startups often follow a familiar narrative: in the pursuit of product-market fit, engineers march to the drumbeat of “move fast and break things”. The company values speed of execution and agility over everything else. However, as systems become more complex (for example, through multiple product pivots), failures happen. Eventually, as the business gains significance and prominence, incidents and breaches become increasingly painful and costly. The company looks to its larger peers for guidance, and finds Google’s SRE book, or chances upon some of John Allspaw’s writing.

Following the book’s recommendations, a…

Photo by Patrick Lindenberg on Unsplash — a picture of SSDs would be much less cool looking.

Earlier this year, I helped load test a gunicorn application on an EC2 instance. This was a 5th generation EC2 instance running a modern Linux distribution, and gunicorn was writing to log files on a single EBS gp2 volume. To our collective awe, we noticed that it was I/O-bound on logging to files on disk.

Even more surprisingly, changing that EC2 instance to use a RAID0 configuration over two EBS volumes worked really well, and we were able to double the number of gunicorn workers on that EC2 instance until it hit the next limiting factor.

Logging in Python is Synchronous

In both Python 2…

A tale from Python at “scale”.

Basically the clone wars. Photo by Hello I'm Nik on Unsplash

A Thought Experiment

Suppose we have a web app, named MyApp. It could be on Django, Spring, or Ruby on Rails, but it started out as a single, small application. All the code is in the same repository, everything is deployed in a single artifact, and all its tables are in the same database. As the app grows and attracts more users, it gets more data. It also gets more developers, more tables in the database, and gets hosted on more machines. The codebase starts to snowball. As we get more successful, we try to scale.

It depends on which issues start becoming…

James Lim

Engineering@Samsara (ex-Affirm) - jimjh.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store