The Story of PowerJob

Hello, everyone! This is Salieri, author of PowerJob. Today I would like to share with the story of PowerJob. I will be explaining where the idea of creating a new job-scheduling framework gets started and how it grows.

Preface

The story started one and a half year ago, when I went to Alibaba Inc for the summer internship. Luckily, the first formal task I was assigned was something related to job scheduling. At that time, a new job-scheduling middleware called SchedulerX 2.0, was developed and put into production. Business tasks that formerly relied on DTS, were required to transfer to this new framework. I was assigned the task by my alumnus. Since then, I began to work with this brand new distributed computing and job-scheduling middleware.

With the completion of migration, business tasks ran quite well. To be honest, SchedulerX 2.0 proves cutting-edge in design concepts. For instance, params could be conveyed from console or OpenAPI to the executor at runtime while the executor could accept the params and behave accordingly, which makes the framework quite flexible. Moreover, with MapReduce processors, developers only need to write several lines of codes to complete distributed computing or deal with big data tasks.

Honeymoon period with SchedulerX 2.0 did not last for long. With the approaching of Double 11 shopping festival, two problems bothered me.
On the one hand, due to the fast-growing business data in Double 11, business tasks that ran well began to fail frequently. Sometimes, the alarm phone calls would be even more frequent than WeChat alarm. With the help of related developers, conslusion was drawn that the failed tasks was caused by lack of system resources. Other business applications took too much memory, which leads to the lack of resources for running SchedulerX 2.0 tasks. To be frank, SchedulerX is not to blame for the problem. It is rational that tasks should own some system resources to run safely. Take computer games as an example, when you installed Windows system on MacBook Air and wanna play PUBG games, you could not even see the welcome page. Why? The requirement of these games has been clearly stated, nobody but yourself is to blame for not meeting the requirement. Brain cells hurt when I had no choice but to rob Peter to pay Paul to get through Double 11.

On the other hand, another problem resulted from rate limiting. To monitor the running status of tasks, I started another application that polls and queries task status. The application ran well until one day, when I made some minor changes and released the application to production environment. When checking logs on the online logging platform, I was shocked. Full screen of RuntimeException left me in depression. Did I mistaken deleting particular module or the database? Did I release with the wrong branch? I calmed down when I excluded the possible causes of the accident. By analysing the exception messages, cause of the problem was finally found. The Api that I used to poll for task running status was rate limited. Explanation was that the Api and SchedulerX had been made to fully support Double 11, non-core applications had to give way. Failing to gain support from SchedulerX, I was compelled to write my own task tracker.
Team SchedulerX is not to blame for these problems. To serve all the business lines, rate limiter is a must for the Apis. Excessive use of the Api should be banned for sure. However, in middle platform architecture, it is not rare that some special needs could not be satisfied. For most developers, by adding a simple Jar dependency and writing some codes, the tasks could then run, which is friendly to them. Few companies would be like Alibaba where millions of tasks require scheduling.

My internship ended with the conclusion of Double 11. I left Alibaba and started the relaxing mode. Each day was filled with sleeping, eating, playing computer games. And occasionally, at midnight, I would regret about wasting time and the ego would tell myself to study hard the next day, mixed feelings somehow.

After living the aimless life for about several months, the dissertation finally saved me. To get my bachelor degree, I had to work hard. When I finally finished the dissertation, the epidemic was almost relieved. The guys who used to play computer games together, went for their internships. We could not make up a full team of five people any more. I began reading novels, which was one of my hobbies.

After reading numerous books, I finally made the great decision, which is to write a new job-scheduling framework based on SchedulerX. It was something that I often wanted but failed to do due to procrastination. My thought was that, even if one day, SchedulerX could not support my applications, I could have Plan B. The framework, called OhMyScheduler and renamed to PowerJob, was born.

Meeting with PowerJob

PowerJob is born with the wish to become the distributed computing and job-scheduling framework of the next generation. Although a long way is in store for me, I stepped out anyway.

Review of job-scheduling frameworks

As programmers, we are all familiar with CRON tasks, like Linux’s Crontab. Job scheduling has been one of the common business requests in the enterprise. In Java, there were several excellent scheduling frameworks.

Currently, of the scheduling frameworks, Quartz, Elastic-Job and XXL-Job have been popular. I would like to share my own opinions about these frameworks.

Quartz serves as the first generation. It affects almost all the frameworks in the future. No web interfaces were provided. Developers had to create their own code with the API, which is not friendly to beginners. Quartz does not provide cluster support and resources in the cluster could not be fully utilized. Meanwhile, Quartz and its processor had to be in the same application. We could not build platform-level scheduling services. Elastic-Job was built on top of Quartz. You may refer to its official documents for details.

XXL-Job serves as the second generation. It compensates several flaws of Quartz in some ways. It was a great framework in the past. However, its disadvantages are obvious in the following aspects:

  • Support MySQL only. For other databases, source codes require modifying.
  • Limited computing ability. Support static sharding only, which could hardly withstand the trial of complexed tasks.
  • Lack workflow support. Tasks that carry with complexed dependencies could not be supported.

As in the Yangtze River waves urge waves, so the younger generation excels the older generation. PowerJob was designed for these sceneries.

Highlight of PowerJob

PowerJob serves as the third generation of job-scheduling framework. On top of the basic scheduling abilities, PowerJob owns the following features.

  • Simple to use: PowerJob provides a friendly front-end Web that allows developers to visually manage tasks, monitor tasks, and view logs online.
  • Complete timing strategy: PowerJob supports four different scheduling strategies, including CRON expression, fixed frequency timing, fixed delay timing as well as the OpenAPI.
  • Various execution modes: PowerJob supports four execution modes: stand-alone, broadcast, Map, and MapReduce. It’s worth mentioning the Map and MapReduce modes. With several lines of codes, developers could take full advantage of PowerJob’s distributed computing ability.
  • Complete workflow support. PowerJob supports DAG based online task configuration. Developers could arrange tasks on the console, while data could be transferred among tasks on the flow.
  • Extensive executor support: PowerJob supports multiple processors, including Spring Beans, ordinary Java objects, Shell, Python and so on.
  • Simple in dependency: PowerJob aims to be simple in dependency. The only dependency is merely database (MySQL / Oracle / Microsoft SQL Server…), with MongoDB being the extra dependency for storing log files online.
  • High availability and performance: Unlike traditional job-scheduling frameworks that rely on database locks, PowerJob server is lock-free. PowerJob supports unlimited horizontal expansion. It’s easy to achieve high availability and performance by deploying as many PowerJob server instances as you need.
  • Quick failover and recovery support: Whenever any task failed, PowerJob server would retry according to the configured strategy. As long as there were enough nodes in the cluster, the failed tasks could execute successfully finally.
  • Convenient to run and maintain: PowerJob supports online logging. Logs generated by the worker would be transferred and displayed on the console instantly, therefore reducing the cost of debugging and improving the efficiency significantly.

Applicable scenes

  • Timed tasks, for example, allocating e-coupons on 9 AM every morning.
  • Broadcast tasks, for example, broadcasting to the cluster to clear logs.
  • MapReduce tasks, for example, speeding up certain job like updating large amounts of data.
  • Delayed tasks, for example, processing overdue orders.
  • Customized tasks, triggered with OpenAPI.

Team PowerJob member.