协调周期性任务的选项。
建筑(30), K8S(4)定期可靠地运行任务是艰难的。达到“完全一旦”运行的正常行为需要单例实例(介绍单点故障)或领导机制确定应该运行哪个例子(这就是为什么通过像这样的东西作为原始选举Chubby可以这么强大)。
Before going too deep, a few definitions:
- 安排是决定任务应该运行的时间和是否。
- 管弦乐集is deciding where and how a task should run.
Even once you have the ability to correctly schedule tasks, you still need a second mechanism to orchestrate them somewhere, and doing this effectively requires a fairly significant amount of coupling between the scheduler and orchestrator. For example, determining if the task completed successfully is information in the orchestrator, but determining the conditions a task should be restarted is potentially behavior you’d want determined by the orchestrator, especially around cases for tasks which are running long (e.g. do you want front-of-line blocking behavior or not).
我们在工作中一直在聊天有关这个问题空间的更多信息,并且已经探索了这个空间中的选项已经有一段时间了,所以我决定稍微环顾一点。
首先,关于我们想要的功能的一些想法:
- 语言不可知论- 我们希望有一个框架可以为我们使用的所有编程语言运行定期任务,而不必为每种语言部署一个框架。
- Familiar deployment paradigm- getting deployment right (with code review, linting, rollbacks, etc) is hard, and we’d prefer to use a single deployment paradigm and mechanism for periodic and long-running processes if possible. This is important both from a leverage perspective (we can improve everyone’s experience in one place), and also from a training and adoption perspective.
- 可靠的- 这些是业务关键任务,以及调度和编排组件都需要可靠和可预测的。
- 重复使用- 理想情况下,我们可以使用相同的协调乐队,以便我们的定期任务和长期运行。这将减少我们的维护开销,让我们更快地获得运营专业知识,并将门向进一步达到垃圾箱的舰队效率优化。
- No vendor lock-in- ideally we’d find a solution that doesn’t require vendor lock-in, e.g. proprietary cloud solutions from AWS, GCP or Azure.
With those features in mind, I spent some time digging around for common solutions:
- AWS Lambda与预定的事件是一个云解决方案,应该解决最直接的用例,既可以从调度和管弦郎的角度来看。它不是特别在任何一方面都灵活,但它确实为您提供了与Cron相同的基元,如果您碰巧使用其支持的语言(Node.js,java,c#,python),那么它可能就足够了。(你也可以使用一个Hybrid亚马逊EC2集装箱服务和AWS Lambda如果您在Orchestration层中需要更多灵活性,则方法。)
- 谷歌的云功能can be paired withApp Engine Cron Service,协调谷歌云酒吧/亚,以与预定事件的AWS Lambda更多或多或少地相同的调度行为,尽管使用更多的碎费。良好的云函数仍然有点限制,只有在今天支持Node.js运行时,但一个想象他们会随着时间的推移增加更多的支持。(您当然可以获得创意并致电运行服务,允许您突破云计算的语言限制。)
- Chronos.是在MESOS之上运行的调度程序,它处理您的调度和编排方面,并在两者中提供良好的灵活性。运行Mesos有点沉重,但这肯定是如果您已经有运行MESOS的操作专业知识。
- Kubernetes' Cron Jobs为您提供类似于ChronoS的解决方案,除非在Kubernetes而不是Mesos上运行,对于已经部署的组织。
cr
is still used pretty frequently as a scheduler, and if you run it in a prebaked AMI in an自动播放组具有一个实例的大小,那么您可以依赖于单个实例的“选举”的ASG。你确实有一个失败的故障,但它会相对速度恢复。您没有的是任何Orchestration的原始工程,因此您仍然需要将其与处理编排方面的第二个系统集成(例如,旋转容器亚马逊EC2集装箱服务或者打电话给AWS lambda.)。在加方面,您可以使用现有的服务器成像和部署策略。- 蟒蛇芹菜is used by many Python shops for this kind of functionality, although it suffers from most of the same scheduling challenges as Cron and orchestration is both fairly naive and restricted to Python (although, just found a去实施芹菜工人,这是一个可怕的发现)。在Python空间和其他语言中,这类类别中有一堆其他类似的解决方案。
- 德格朗纯粹是一个调度程序,旨在提供容错调度,即使某些节点发生故障。(E.g. solving the leader-election and leader handoff problems for you, as opposed to building your own on top of Zookeeper, etc.) It also provides a nice UI, although depending on your security and compliance needs, it’s possible that UI is a mixed blessing.
- Bistro在这个空间广泛,但是感觉更多的目标呢eted at running jobs once for every resource in a fleet, as opposed to running a job once somewhere on some resource in a fleet.
Of those options, it feels like for larger companies, you’ll likely end up with either a Cloud based solutions (AWS Scheduled Lambdas or App Engine Cron), Chronos if you’re already happy running Mesos, or Kubernetes' Cron Jobs if you’re already happy running Kubernetes.