Spotify Engineering Culture - Part 1
One of the big success factors here at Spotify is our agile engineering culture tends to be invisible we don't notice it because it's there all the time kind of like the air we breathe but if everyone understands the culture we're more likely to be able to keep it and even strengthen it as we grow so that's the purpose of this video when our first music player was launched in 2008 we were pretty much a scrum company scrum is a well-established agile development approach and it gave us a nice team-based culture however a few years later we had grown into a bunch of teams and found that some standard scrum practices were actually getting in the way, so we decided to make all this optional rules are good start but then break them when needed we decided that agile matters more than scrum and agile principles matter more than any specific practices, so we renamed the scrum master role to agile coach because we wanted servant leaders more than process masters we also started using the term squad instead of scrum team and our key driving force became autonomy so what is an autonomous squad a squad is a small cross-functional self-organizing team usually less than eight people they sit together, and they have end-to-end responsibility for the stuff they build design commit deploy maintenance operations the whole thing each squad has a long-term mission such as make Spotify the best place to discover music or internal stuff like infrastructure for a be testing autonomy basically means that the squad decides what to build how to build it and how to work together while doing it there are of course some boundaries to this such as the squad mission the overall product strategy for whatever area they are working on and short term goals that are renegotiated every quarter our office is optimized for collaboration here's a typical squad area the squad members work closely together here with adjustable desks and easy access to each other screens they gather over here in the lounge for things like planning sessions and retrospectives and back there is a huddle room for smaller meetings or just to get some quiet time almost all walls are whiteboards so why is autonomy so important well because it's motivating and motivated people build better stuff also autonomy makes us fast by letting decisions happen locally in the squad instead of via of managers and committees and stuff it helps us minimize handcuffs in waiting, so we can scale without getting bogged down with dependencies and coordination although each squad has its own mission they need to be aligned with product strategy company priorities and other squads basically be a good citizen in the Spotify ecosystem Spotify is overall mission is more important than any individual squad so the key principle is really been autonomous but don't some optimize it's kind of like a jazz band although each musician is autonomous and plays its own instrument they listen to each other and focus on the whole song together that's how great music is created so our goal is loosely coupled but tightly aligned squads we're not all there yet, but we experiment a lot with different ways of getting closer in fact that applies to most things in this video this culture description is really a mix of what we are today and what we are trying to become in the future alignment and autonomy may seem like different ends of a scale as in more autonomy equals less alignment however we think of it more like two different dimensions down here is low alignment and low autonomy a micromanagement culture no high level purpose just shut up and follow orders up here is high alignment but still low autonomy so leaders are good at communicating what problem needs to be solved, but they are also telling people how to solve it high alignment and high autonomy means leaders focus on what problem to solve but let the teams figure out how to solve it what about down here them low alignment and high autonomy means teams do whatever they want and basically all run in different directions leaders are helpless and our product becomes a Frankenstein.
We're trying hard to be up here aligned autonomy, and we keep experimenting with different ways of doing that so alignment enables autonomy the stronger alignment we have the more autonomy we can afford to grant that means the leaders job is to communicate what problem needs to be solved and why and the squad's collaborate with each other to find the best solution one consequence of autonomy is that we have very little standardization when people ask things like which code editor do you use or how do you plan the answer is mostly depends on which squad some do scrum sprints others do Kanban some estimate stories and measure velocity others don't it's really up to each squad instead of formal standards we have a strong culture of cross-pollination when enough squads use a specific practice or tool such as get that becomes the path of the least resistance and other squads tend to pick the same tool squads start supporting that tool and helping each other and it becomes like a de facto standard this informal approach gives us a healthy balance between consistency and flexibility our architecture is based on over a hundred separate systems coded and deployed independently there's plenty of interaction but each system focuses on one specific need such as playlist management search or monitoring we try to keep them small and decoupled with clear interfaces and protocols technically each system is owned by one squad in fact most quads own several, but we have an internal open source model and our culture is more about sharing than owning suppose squad one here needs something done in system B and squad two knows that code best they'll typically ask squad two to do it however if squad two doesn't have time, or they have other priorities then squad one doesn't necessarily need to wait we hate waiting instead they are welcome to go ahead and edit the code themselves and then ask squad two to review the changes so anyone can edit any code, but we have a culture of peer code review this improves quality and more importantly spreads knowledge over time we've evolved design guidelines code standards and other things to reduce engineering friction but only when badly needed so on a scale from authoritative to liberal we're definitely more on the liberal side now none of this would work if it wasn't for the people we have a really strong culture of mutual respect I keep hearing comments like my colleagues are awesome people often give credit to each other for great work and seldom take credit for themselves considering how much talent we have here there is surprisingly little Eagle one big aha for new hires is that autonomy is kind of scary at first you and your squad mates are expected to find your own solution no one will tell you what to do but it turns out if you ask for help you get lots of it and fast there's genuine respect for the fact that we're all in this boat together and need to help each other succeed we focus a lot on motivation here an example an actual email from the head of people operations hi everyone our employee satisfaction survey says 91% enjoy working here and 4% don't know that may seem like a pretty high satisfaction rate especially considering our growth pane from 2006 to 2013.
We've doubled every year and now have over 1200 people, but then he continues this is of course not satisfactory, and we want to fix it if you're one of those unhappy 4% please contact us we're here for your sake and nothing else so good enough isn't good enough half a year later things had improved and satisfaction rate was up to 94% this strong focus on motivation has helped us build up a pretty good reputation as a workplace, but we still have plenty of problems to deal with so yeah we need to keep improving okay, so we have over 50 squads spread across four cities some kind of structure is needed currently squads are grouped into tribes a tribe is a lightweight matrix each person is a member of a squad as well as a chapter the squad is the primary dimension focusing on product delivery and quality while the chapter is a competency area such as quality assistance agile coaching or web development as squad member my chapter lead is my formal line manager a servant leader focusing on coaching and mentoring me as engineer, so I can switch squads without getting a new manager it's a pretty picture huh except that it's not really true in reality the lines aren't nice and straight and things keep changing here's a real-life example from one moment in time for one tribe and of course it's all different by now and that's okay the most valuable communication happens in informal and unpredictable ways to support this we also have guilds a guild is a lightweight community of interest where people across the whole company gather and share knowledge within a specific area for example leadership web development or continuous delivery anyone can join or leave a guild at any time guilds typically have a mailing list biannual on conferences and other informal communication methods most organizational charts are an illusion so our main focus is community rather than hierarchical structures we've found that a strong enough community can get away with an informal volatile store if you always need to know exactly who is making decisions you're in the wrong place one thing that matters a lot for autonomy is how easily can we get our stuff into production if releasing is hard we'll be tempted to seldom release to avoid the pain that means each release is bigger and therefore even harder it's a vicious cycle but if releasing is easy we can often release that means each release is smaller and therefore easier to stay in this loop and avoid that one we encourage small frequent releases and invest heavily in test automation and continuous delivery infrastructure release should be routine not drama sometimes we make big investments to make releasing easier.
For example the original Spotify desktop client was a single monolithic application in the early days with just a handful of developers that was fine but as we grew this became a huge problem dozens of squads had to synchronize with each other for each release and it could take months to get a stable version instead of creating lots of process and rules and stuff to manage this we changed the architecture to enable decoupled releases using chromium embedded framework the client is now basically a web browser in disguise each section is like a frame on the website and squads can release their own stuff directly as part of this architectural change we started seeing each client platform as a client app and evolved three different flavours of squads client app squads feature squads and infrastructure squads a feature squad focuses on one feature area such as search this squad will bill ship and maintain search related features on all platforms a client app squad focuses on making release easy on one specific client platform such as desktop iOS or Android infrastructure squads focus on making other squads more effective they provide tools and routines for things like continuous delivery a be testing monitoring and operations regardless of the current structure we always strive for a self-service model kind of like a buffet the restaurant staff don't serve you directly they enable you to serve yourself, so we avoid handcuffs like the plague for example an operation squad or client app squad does not put code into production for people instead their job is to make it easy for feature squads to put their own code into production despite the self-service model we sometimes need a bit of sync between squads when doing releases we manage this using release trains and feature toggles each client app has a release train that departs on a regular schedule typically every week or every three weeks depending on what your client just like in the physical world if trains depart frequently and reliably you don't need much upfront planning just show up and take the next train suppose these three squads are building stuff and when the next release train arrives features a B and C are dumb while D is still in progress the release train will include all four features but the unfinished one is hidden using a feature toggle it may sound weird to release unfinished features and hide them but it's nice because it exposes integration problems early and minimizes the need for code branches unmerged code hides problems and is a form of technical debt feature toggles let us dynamically show and hide stuff in tests as well as production in addition to hiding unfinished work we use this to a/b tests and gradually roll out finished features all in all our release process is better than it used to be, but we still see plenty of improvement areas.
So we'll keep experimenting this may seem like a scary model letting each squad put their own stuff into production without any form of centralized control, and we do screw up sometimes, but we've learned that Trust is more important than control why would we hire someone who we don't trust agile at scale requires trust at scale and that means no politics it also means no fear doesn't just kill trust it kills innovation because if failure gets punished people won't dare try new things so let's talk about failure actually no let's take a break get on your feet get some coffee let this stuff sink in for a bit and then come back when you're ready for part two.