Building an IT Engineering Team

tlark included in Leadership Team Management Collaboration

2024-03-05 2965 words 14 minutes

Contents

I started working in tech back in 1999. My first tech job was a repair technician at a computer store chain. I did hardware and software repairs, warranty repairs, OEM systems building, and similar support type work. Over the years I’ve likely had a similar experience to most other workers in tech. I have had bad bosses, and good bosses. I also have worked with great people, and not-so-great people as well. I will say that I am lucky in the sense that most folks I have worked with have generally been great, and the not-so-great experiences were very minimal.

Throughout my career I have done technician work, IT support roles, System Administration, System Engineering, and I have done so in both public and private sectors. As of last year I finally took the plunge and jumped from IC (individual contributor) to M (manager). I also worked in vendor space and traveled for work internationally and got to peek behind many curtains. I ended up in Silicon Valley, and then was the main onsite engineer for many Silicon Valley companies. I have been exposed to many philosophies, methodologies, and have interacted with tons and tons of really smart and amazing people.

Over time, you start to take mental note of things. Things that work, that don’t work. That scale and don’t scale. These experiences are oftentimes also big teaching moments for you and your professional growth. So, all things considered this is how I have come to where I am currently at in regard to building an IT engineering team. I started at my current job in 2019 with the agreement that when the team expanded, I could build the team how I saw fit. My boss at the time agreed, and over the next 5 years I built the team out, designed the roadmaps, and eventually became the team manager.

Get Rid of What You Don’t Like

One of the first things I established when I started at my current place of employment, was to get rid of everything I did not like from my previous jobs. If I was going to get top talent to join me, I should have a great environment to work in, and I should give the best potential opportunities to folks wanting to join the team here. So, I decided that everything I did not like about IT or Engineering I would throw out, unless my boss overrides my decision. Which I was 100% willing to let that happen if or when that came up. I first tackled packaging and third party app patching. I set up and open sourced automatic patching and a DEP Notify enrollment starter scripts. I did not want to waste any time building packages or dealing with device enrollment. So, I cranked out these two things in my first 3-4 weeks. Later on I retired the AutoPKG + JSS Importer workflow for GitOps Munki.

I started making a list of things I wanted to accomplish as the sole CPE and eventual team lead as we expanded here. I work for a data company, so the data skills were already in the back of my mind, but trying to find a CPE that is good at data stuff could be over complicating my search for team members. So, it was a nice to have. Here goes the wish list I started with when building out the team here:

No overcomplicated processes or workflows that we control
No scrum or project management that doesn’t have a positive return on investment (for our work)
No manual labor when we can help it
Diversify Skills
Diversify Tools
Be able to pivot
Avoid vendor lock-in as much as possible
Build with cloud, security, and engineering first philosophies
Always leverage server-less where we can

To provide some context of how I approached these ideas, here are some things I have built or established in the past 5+ years. There will be no dedicated QA, as I feel QA is largely misunderstood in general in tech, but I think even more so in the IT/Ops side of the world. Instead, we would adopt a peer review model. You are never allowed to self-publish your own work into production, unless another human signs off on your work. Meaning QA is now everyone’s job, and it meant I can now focus on hiring engineers that can code and automate. Furthermore, I had the team build a CI/CD automation pipeline, and with the features of git and things like branch protection. Now I could force another human looking at someone’s work before it gets merged back to a main branch. I also had submitting pull requests to the prod branch disabled, and the only way the prod branch gets updated is through promoting our stage branch to production. This also means that the entire team will be collaborating within our build tools and our development environment, which helps with consistency across individual’s work. I have just seen too many bad practices from IT engineers over my time that I know even a human with the best intentions can take shortcuts when they feel pressured to do so. I also have committed bad practices myself once or twice. Setting up a GitOps workflow, with branch protection and forcing at least one peer review before you can merge solves a lot of problems I have seen IT shops have in my lifetime. It also vastly improves delivery as now things are automated and the basic day-to-day system administrator tasks are now done by a single computer in the cloud running our CI pipeline.

Servers are a pain point. No matter how you look at it, they are a pain point for every team. They require labor to stand up, maintain, patch, configure, and let’s not forget all the compliance scans you are now subjected to for every server you own. To mitigate this pain point I did not like, I have set a precedent that we will always try to leverage server-less compute first and foremost. Things like AWS Fargate, Lambda functions, cloud storage, CDNs, PaaS, API Gateways, so on and so forth. You cannot install agents, vulnerability scans, patch, or even log into any of the computers that run those services. I blogged about our server-less Munki setup before as well.

Diversify Everything You Can

One mistake I have witnessed over and over again in my career, is that Orgs keep hiring for specific tools, or they hire a team of people with the same skills. I feel both of these ideas can work, but are ultimately a suboptimal way to build an IT engineering team. If I hired a bunch of Mac Admins, that is what I would have, and it would not expand much beyond that. I wanted to hire for specific skills and experience, and I wanted to diversify those skills and experiences across the team. So, I hired Windows Engineers, macOS engineers, DevOps engineers, Sys Admins, Mobile Engineers, VDI experts, etc. A diversified set of skills and experiences will enable your team to build so much more, and do so from a broader point of view. I feel that this avoids things like echo chambers, and poor decisions that build exponential more tech debt over time which could be mitigated if you had pivoted to something else instead.

So, the first hires we did for our team I went out and got a Sys Admin to help out with the day-to-day work, DevOps engineers and a Windows Engineer to be the SME for our Windows deployment. I feel we have positioned ourselves to be more agile and able to pivot across many tech stacks if the business needs us to. It also offers opportunity for cross-training, mentorship, and the ability to actually try something new and not be siloed into one aspect of tech work. I have been siloed before, and it was always a driver for me to start looking for a new job when it happened. Having a diversified team positions you to be able to accomplish so much more than just a bunch of people with the same overlapping skills and experience.

Diversify your tools stack as well. Another massive problem I have seen IT teams do, is they will pile so much tech debt on top of an existing tool because they can, and perhaps because it is the easiest way to do that. I have highly avoided using any of our MDM tools for anything that is business critical, or integrate our MDMs with anything that would lock us into that MDM vendor. I don’t want to be in a position where we are unhappy with our software vendors, but due to a giant mess of a tech stack we cannot migrate from the tools we have all because we made bad design decisions early on. Then over time, piled so many things on top of those bad design decisions. Bad decisions happen, we do it all the time. I have done it plenty of times myself. What makes a bad decision worse, is when you pile it onto something that locks you into a bad decision for life. I don’t want to be the team that cannot pivot or modernize because we decided to build a monolith and pile onto it for years versus diversifying our tech stacks to use tools that make the most sense.

Build SMEs and Cross Train

This is something that is very tough to do. I am still trying to figure this out to be honest. Everyone has to work and deliver in a tech job. Employers aren’t going to pay you to just learn things all day, they pay you to do work. So, we started assigning folks a subtitle of SME in their specific area. We have SMEs in Windows, macOS, Linux, Cloud, CI/CD, VDI, Mobile, DevOps, so on and so forth. A person can be SME in more than one subject as well, there aren’t any real hard rules here. I also expect all my Sr. Engineers and above to be mentors in some capacity, as well as mentees. No one can know everything, so everyone has an opportunity to learn.

So, I drive collaboration to our team channels in Slack, and I will identify things I feel are IC 1 level work in a specific subject and then farm it out to the team for anyone who is new to that specific tech subject and wants some experience. For example, I might identify something that is a nice to have, and something I would consider IC1 or IC2 engineering level work I simply create a Jira task for it and then ask the team who needs practice in this subject?

The struggle here is you must train everyone to not just fix things in a matter of minutes. A Senior level engineer or higher should be able to do these type of things in minutes, but to a beginner that could be a slight challenge, and we want to provide hands-on experience to grow the team’s collective skills. So, now I am having all my engineers backlog anything that could be considered IC1 or IC2 level work, and it is not a high priority. Then anyone who doesn’t have experience in that subject they can grab that IC1/IC2 level work from the backlog and attempt to do it. The big problem here is training your SMEs to not just instantly fix the problem, or implement the nice to have immediately. I feel this is a good method for cross-training.

Optimize From Past Experiences

I have seen scrum done many ways across multiple jobs. I have had jobs where every team just gets to run their own concept of scrum. I have seen upper management force a top-down model of scrum across an entire org. I have seen ClickOps workflows in Jira that would just annoy you because it is manual labor you must click on every single step for each item. I have also seen things that should never be in scrum, that were all over it. Things like updating packages, this is a sustaining task that should be automated, not wasting hours of time doing paper work on it. I am not saying scrum is bad, because it is really how you implement it. I am not saying an org should or should not use scrum. I am honestly indifferent to it as a framework/tool, but I have seen so many iterations of it that make no sense.

Even when I look at story points, which in my view, are supposed to only measure capacity and priority. Yet, so many orgs use it as a measurement of performance. Which it isn’t meant to be. I have even heard things like story points will justify more headcount if a team demonstrates they need more headcount by the number of story points they can deliver. I have never in my life seen a headcount get created due to story point data. I will wait until I see that happen!

So, I tossed out everything about project management and scrum and just started with agile and then just decided we would build this out over time. We ended up on something like this, for our work:

Simple kanban board
no story points
no estimations
focus on quarterly deliverables vs managing sprints
status updates in a specific status updates Slack channel
meet when we need to by going direct
1 team meeting per a week

Making these changes from every other job I have had in the past 10-15 years has made life so much easier for us. There is no time-wasting or pressure to estimate everything you do in a scrum board. We don’t have daily stand-ups, you just type your status update into a Slack channel. I try to give as much time back to my ICs as I can, so they can spend that time developing solutions, peer reviewing other team member’s work, collaborating and cross-training, and mitigating burnout.

Now, I would like to also say, the things I got rid of could easily come back. I want to position our team where we can decide what works best for us, and what makes us most productive. If going back to scrum + sprints does this, then we shall adopt that. If it doesn’t have a ton of value, then we will not. We want to remain agile and be able to pivot whenever we must, and do so with the least amount of friction.

Also, this is for our work. In cross team collaboration scenarios there is likely a PM and the rules change. We are agile and will adapt to how a PM wants to manage a project. We will give estimations when we can get the work done, and we try to be honest about timelines.

Don’t Worry About Failure

If there is one thing I have learned over the years, and I try to establish it as a philosophy for our team, is that failure is just a part of the process. You just need to keep going, keep progressing, and don’t let failure block you from expanding things. Failure is also an opportunity to take mental notes of what doesn’t work. Remember, you may not always know what to do, but you likely have a list of things not to do that you know from previous experience. If your team members are afraid to fail, then they may not take risks, or try things that in their mind may be perceived as failure.

Encourage taking risks. If I played it safe my entire career I would have never grown skills or experience wise. Many years ago I was afraid to swap to a new programming language. I knew the languages I already used pretty well, or well enough to do most of the things I needed to do in my job. I was afraid that it would just cause complexity, and I had definite hurdles in my way. However, one day I just decided, no more code in the languages I know! I only wrote code in the new language I was trying to learn, and I grew so much and faster than I ever had because I learn best by doing actual work.

I often highly encourage folks in the Mac Admin Slack to expand into new languages and new tech. What if you were a Mac admin that also knew Linux? Perhaps you are a Mac Admin that can code in Swift, Python, and Go? Let’s not forget everyone’s favorite in the Mac Admin Slack, what if you also knew how to manage Windows devices? No one is born with any knowledge or skills, and humans must train or practice over time to build those things. This makes you a better engineer, and it makes you more desirable by employers the more diversified your skill sets are.

Also, I have only been a manager for a year now. I have so much to learn in this role, and I oftentimes have no idea what I am doing. I ask for help and guidance from my manager and my peers. I go to LinkedIn to read some leadership articles. I try to find out what other managers in tech are doing with their teams. I will likely fail a lot while I learn how to transition from an IC to a Manager and I will likely learn a bunch in that process. This is okay, this is to be expected. I have hopefully learned enough to mitigate all my failures to be non-impactful to anyone though. The point is though, I am no longer letting fear of failure stop me from trying to grow and learn in my life.