
迁移:唯一可扩展的技术债务。
infrastructure (34), management (130)我参加的最有趣的迁移是将Puber从Puppet托管服务迁移到一个完全自助服务的配置模型,该公司的任何工程师都可以在两次点击中旋转新的服务。他们不仅可以每天提供多个服务,每天都在服务完成的时间内,每个新雇用的工程师都从他们的第一天从划痕旋转服务。
What made this migration so interesting was the volume. When we started, provisioning a new service took about two weeks of clock time and about two days of engineering time, and we were falling further behind each day. At the time it was a more-than-just-a-little stressful, but it was also a perfect laboratory to learn how to run large-scale software migrations: it was large enough to see even small shifts, and long enough that we got to experiment with a number of approaches.
Migrations are both essential and frustratingly frequent as your codebase ages and your business grows: most tools and processes只支持对恶性肿瘤的一个数量级h在变得无效之前,因此快速增长使他们成为一种生活方式。这不是因为它们是坏的过程或工具不佳,相反:在显着增加的范围内停止工作的事实是它被适当地设计为先前的约束rather than being over designed。
因此,您可以切换大量工具,并且您迁移到新软件的能力可以很容易地成为整个速度的定义约束。鉴于他们的重要性,我们不会经常谈论运行迁移;让我们解决这个问题!
为什么迁移很重要
迁移事项是因为它们通常是唯一可以在技术债务方面取得有意义进展的可用途径。
工程师讨厌技术债务。如果有一个简单的项目,他们可以亲自做以减少技术债务,他们会自行接受。工程管理人员也讨厌技术债务。如果有一个简单的项目,他们的团队可以孤立地执行,他们会安排它。在聚合中,这导致了一个动态,在那里有很少的低悬垂的水果,以减少技术债务,而且最剩下的选择需要许多团队共同努力实施它们:迁移。
每个迁移旨在创建技术杠杆(“您的索引不再需要在单个服务器上适应!”)或降低技术债务(“您的承认写作保证保留持久故障转移”)。他们占据了立即下降的尴尬领土今天的贡献明天换取了更多的容量。这使得它们争议计划,随着系统变得更大,它们变得更加昂贵。
Lore告诉我们,Googlears有一个短语,“运行静止”,描述一个团队,其整个容量在升级依赖关系和模式中消耗,使得它无法在他们拥有的产品/系统上进行进展。开支all你的迁移时间是极端的,但每个中型公司都有一个长的队列迁移它不能员工:从VMS移动到容器,滚动断路,移动到新的构建工具;该名单毫不费力地扩展到日落中。
迁移是有效管理技术债务作为公司和代码的唯一机制。如果您在软件和系统迁移没有生效,则最终将在技术债务方面萎缩。(无论如何还必须在稍后做一个,只是它可能是一个完整的重写。)
运行好迁移
The good news is that while migrations are hard, there is a pretty standard playbook that works remarkably well: Derisk, Enable , and then Finish.
疯狂
The first phase of a migration is **derisking **it, and to do so as quickly and cheaply as possible. Write a设计文件and shop it with the teams that you believe will have the hardest time migrating. Iterate. Shop it with teams who have atypical patterns and edge cases. Iterate. Test it against the next six to twelve months of roadmap. Iterate.
在你进化设计之后,下一步是embed into the most challenging one or two teams并与这些团队并排工作,以建立,发展和迁移到新系统。不要以最简单的迁移开始,这可能导致虚假的安全感。
Effective derisking is essential, because每个赞同迁移的每个团队都在赌注that you’re going to get this damn thing done, and not leave them with a migration to an abandoned system that they have to revert. If you leave one migration partially finished, folks will be exceedingly suspicious of participating in the next.
使能够
Once you’ve validated the solution solves the intended problem, it’s time to start sharpening your tools. Many folks start migrations by generating tracking tickets for teams to implement, but it’s better to slow down and build tooling toprogrammatically migrate the easy ninety-percent。This radically reduces the migration’s cost to the broader organization, which increases their success rate and creates more future opportunities to migrate.
Once you’ve handled as much of the migration programmatically as possible, figure out the自助式工具和文档you can provide to allow folks to make the necessary changes without getting stuck. The best migration tools are incremental and reversible: folks should be able to immediately return to previous behavior if something goes wrong, and have the necessary expressiveness to derisk their particular migration path.
文档和自助式工具是产品,在同一地区下茁壮成长:坐下来与一些团队一起观看他们按照你的指示,然后改善它们。找到另一个团队,重复。花费额外的两天故意使您的文档清洁和工具直观可以省略大幅迁移。做它!
Finish
The last phase of a migration is deprecating the legacy system you’ve replaced. This requires getting to 100% adoption, and that can be quite challenging.
Start by停止出血,这确保了所有新写的代码都使用新方法。这可以在intiners中安装棘轮,或更新文档和自助工具。这始终是第一步,因为它转向你的朋友。默认情况下,您现在正在取得进展而不是落后于后面。
好的,现在你应该开始generating tracking tickets和一种机制推动迁移状态to teams that need to migrate and to the general management structure. It’s important to give wider management context around migrations because they are the folks who need to prioritize the migrations; if a team isn’t working on a migration, it’s typically because their leadership has not prioritized it.
At this point you’re pretty close to complete, but have the long tail of weird or unstaffed. Your tool now isfinish it yourself。这不一定有趣,但达到100%会要求团队导致迁移挖掘到角落和缝隙队自己。
我的最后一个完成迁移的最终提示是围绕着识别。在他们正在进行时庆祝迁移是很重要的,但大多数庆祝活动承认应保留成功完成。In particular, starting but not finishing migrations often incurs significant technical debt, so your incentives and recognition structure should be careful to avoid perverse incentives.
What have you seen make migrations more effective? What are some of the anti-patterns you’ve experienced?