我对目前开发人员元生产力工具的怀疑。
productivity (3)It’s hard to write about engineering leadership in 2020 and not mention the research from加速和多拉。它们提供了一种关于如何提高开发人员生产力的数据驱动的视角,这是一个非常魔法的东西。他们为什么不再被使用?
There are three core problems I see:
- 使用生产率测量来评估而不是学习的令人恐惧的陷阱
- 仪器生产力管道需要跨越操作很多不同的工具
- 大多数仪器和仪表板工具都强迫您对问题造型不佳
这些创造了足够的摩擦,大多数球队永远不会绕过使用这些指标,甚至是那些知道他们应该的人。
学习评估
那里有许多工程生产力测量初创起来,其中许多在过去两到三年中存在存在。不是你是一个工程经理,喜欢了解谁落后于他们的速度?当您在过去的季度每个月每月提交时,您不会在下一个校准会话中堆叠排名是轻松的?
I’m personally convinced that these companies are selling products that harm the companies that use them rather than help them. Using productivity metrics to measure individuals this way is akin to incident retrospects that identify human error as the root cause. It’s performative, and if you want to blame someone then just go ahead and blame someone, don’t waste your time getting arbitrary metrics to support it.
这里真正的需要正在捕获数据以支持学习,并且批量发生学习。看看缺陷率如何与团队进行比较是有用的,只要你挖掘那些团队的不同程度不同。也许他们有不同的测试或代码审查过程,也许他们有不同的任期或资历组合。
As long as tooling keeps privileging the manager who wants to grade their team rather than learn from the development process, these tools will be actively distrusted by the engineers who instrument them and create false confidence in managers using them. The right tool here should be designed exclusively from a learning perspective.
仪表很多工具
像Buritica这样的资源工程交付指标的底漆帮助给出一种方法的一般形状,但有很多细节可以通过。生产力管道包括这么多工具,可以笨拙地以一致的方式借助它们。一个小型启动可能使用GitHub,Docker,Terraform,Jenkins和Kubernetes,每个都会引入复杂的问题来回答。是拉请求的唯一标识符吗?如果我们的构建备份,我们会在单个构建中部署两个拉请求?我们如何区分Jenkins崩溃由磁盘空间错误而导致的失败测试?升级所有POD时,kubernetes部署是否完成?如果一个豆荚发生故障怎么办?如果故障后来是怎么做到与代码无关的节点故障?在创建拉拔请求之前,我们如何在工作流程中获得洞察力?
这些问题中的每一个都得到了应答,但他们花了很多时间来通过并将其系在一起融入一个凝聚力的现实看法。它也很容易以旨在为正在发生的事情产生微妙的误解。通过劝阻许多团队来完成工作的细节。
最好的工具将自动与大多数常用工具集成,同时还提供完全可自定义的客户端,沿着DataDog集成策略的行。甚至只有在整个过程中使用哪些唯一的标识符,甚至就会非常有用。
Need observability over monitoring
In the monitoring and observability space, Honeycomb and Lightstep and have pushed a definition of observability centered on supporting ad-hoc rather than precomputed queries. Monitoring tools like Grafana/Graphite/Statsd might push you to emit measures that are pre-aggregated to support certain queries like “what’s the p50 latency by service, by datacenter?” Observability tools like Honeycomb and Lightstep push you to emit events which can are then aggregated at query-time, which supports answering the same question you’d ask Grafana, but also questions like “what are the latencies for requests to servers canarying this new software commit?” or even “show me full traces for all requests running this commit.”
Too many of these developer meta-productivity tools focus on monitoring-style solutions which are a mediocre fit for measuring productivity and an even poorer fit to support learning. This is a shame because the infrastructure challenges that drive infrastructure monitoring towards monitoring simply don’t exist when looking at human-scale events.
For example, if you’re measuring server response times across a fleet of Kubernetes nodes running ten pods per node, then you might be looking at 100 requests per second * 10 pods per node * 1000 nodes * 3 availability zones, which is three million measurements per second that you need to record. This drives tradeoffs towards reducing the quantity of data being stored. The observability infrastructure required to store all those events without aggregatingat a reasonable price是复杂的和定制的。(大致我的理解是,大多数可观察性工具,而是从环形缓冲区捕获事件,并且从环形缓冲区有趣的事件中驱逐并汇总数据转换到更耐用的存储。)
However, the scale that these developer meta-productivity tools operate at isso much smallerthat there’s no need to solve the underlying infrastructure problems. Just write it to MySQL or Kafka (streaming to S3 for historical data) or something. There’s no reason that the underlying events should be available to support deeper understanding.
基本工作流程我想看到这些系统tems offer is the same as a request trace. Each pull request has a unique identifier that’s passed along the developer productivity tooling, and you can see each step of their journey in your dashboarding tooling. Critically, this lets you follow the typical trace instrumentation workflow of starting broad (maybe just the PR being created, being merged, and then being deployed) and add more spans into that trace over time to increase insight about the problematic segments.
So, who cares?
I’m writing this as (a) the hope that there are folks out there working on this problem from this perspective, and (b) a reusable explanation of what sort of developer meta-productivity tools I’m excited about when folks email me for feedback/angel investment/etc.
For what it’s worth, I’m not necessarily saying I think this will be a good商业。I’m confident it’s the tool that engineering leadership teams need to more effectively invest into developer productivity, but I’m less confident it’s the sort of thing people want to pay for; figuring out the go-to-market and distribution strategy is probably the hardest part of this sort of product. This shows another advantage of the observability/traces/spans approach is that you can import the event history and show value立即地instead of having to have folks use the tool to build out new metric aggregations.
尽管如此,我认为像Github这样的公司将成为在这种事情上取得进步的最佳位置,特别是因为GitHub动作在开发人员工作流程中占用了更多空间。(我不确定尼科尔福尔森正在努力在Github上的新角色,但我有一个梦想与之相关的梦想。)