Timing task practice in distributed scenarios

Timing task practice in distributed scenarios

Background

Applications often require some tasks to be executed regularly, which can be easily implemented through the @Scheduled annotation in spring.

However, Goose’s current projects generally do not only deploy one instance, at least two units must be deployed to support uninterrupted service releases. It is not a big problem to deploy more than a dozen or dozens of units.

In this way, when we write a timed task, we need to consider whether all instances of the task will be executed when the task is executed, so that it will affect the business.

Circumstances that have no impact, such as:

  • The task is related to the instance, that is, although the code is the same, the logic of execution is different, or the data of the operation is different, for example, each process is assigned to its own task
  • No modification of shared data
  • The shared data is modified, but the operation on the shared data is idempotent (multiple requests and one request have the same impact)

The circumstances that caused the impact. . . on the contrary.

Solutions

Through an exclusive lock to control the execution right of each task, the instance of the lock must be obtained to execute the task, and the lock will be released after the execution. The resource of this lock needs to be the same resource that all instances can access, which can be implemented through MySQL, Redis, etc.

Because all instances need to request this shared resource, it is necessary to provide a service to receive these requests.

aims

Replace @Scheduled with a custom annotation @SyncJob to have the ability to execute synchronously in a distributed manner (only one execution at the same time), and the timing rules are the same as @Scheudled.

Based on this goal, the following design is carried out.

Architecture design

As a "resource center", DB needs the following structure:

ID : The unique identification of the task, which can determine the specific execution method

Status : Task execution status, pending execution, executing

The execution time of this time : updated every time the execution starts, and the status is used as the judgment condition of the CAS operation

The end execution time of this time : updated at the end of each execution, if you need to support the end time interval, you need

  • register Register the information of the timed task to the "timed task service". The most important thing is a unique identifier that represents the method. It can be customized, or it can come from the application name + the complete class name + the method name (overloading? Yes, but no necessary)
  • query query the "to-be-executed" tasks on the current instance
  • lock acquires the execution permission of the "current execution round" of the target task (if another instance grabs the lock first and releases the lock after execution, and the current time is not up to the next execution time, it should not get the resource)
  • unlock to release the lock

Process Design

  1. Register task information, it will be automatically completed at startup
  2. Query the tasks to be executed in the current instance, the polling interval is 1s
  3. Obtain the execution permission of the target task (locked)
  4. Perform tasks (implement the @SyncJob annotation method through reflection)
  5. Release execution authority (unlock)

Technical solutions

  • Through the automatic assembly of springboot, you only need to introduce a maven dependency to use the function
  • Scan all beans annotated with @SyncJob when the application starts, and register to ScheduleService
  • Push the task information in the current application to schedule-service when the application starts, and persist (if it does not exist) to the database
  • Polling the tasks to be executed (request to schedule-service, 1 time/s), judging the execution conditions (cron expression, specified interval and other rules), grabbing locks, executing, and unlocking

Automatic assembly

The capabilities provided by springboot, the various starters in the Spring Family Bucket are based on this capability.

Only need to add a maven dependency, when the application starts, it will automatically scan the specified classes under the package and create the specified beans, so that we don't have to write a bunch of repetitive codes in our projects to create beans.

Add file: resources/META-INF/spring.factories

org.springframework.boot.autoconfigure.EnableAutoConfiguration=\
com.xxx.SyncJobConfig

The project that references this dependency will create and manage the beans defined in the SyncJobConfig class at startup.

There are two important beans that need to be created in SyncJobConfig:

  1. Customize the bean scanning class to implement the BeanPostProcessor interface. Spring will call its method every time a bean is created. It can be used to scan beans with @SyncJob methods and put them in a "collection" for backup. You can also register job information to schedule-service here. (Or register after startup)
  2. Timed task execution class, responsible for executing the core process: polling, locking, executing, unlocking...

Anti-pit guide

The lock is not released due to application shutdown/restart

Q: What if someone resends the mission? The application is closed halfway through the execution of the task and the lock is not released. After restarting, the record of that job will never be queried.

A: Define a bean destruction method (@PreDestroy) for the timed task execution class, which will be automatically called by the framework when the application is closed, and the aftermath will be completed in it.

//Give a chestnut
public class ScheduleService {
   //... omit irrelevant code
    @PreDestroy//The application is closed will destroy the bean, and the destruction of the bean will execute the method modified by this annotation
    public void shutdown() {
        running = false;//Use a flag to make polling skip the simple and rude
        jobExecutor.shutdown();//No longer receive new tasks
        try {
           //Wait for all jobs to be executed, or time out
            boolean ok = jobExecutor.awaitTermination(120, TimeUnit.SECONDS);
            logger.warn("ScheduleService {}", ok? "Perfect stop": "Waiting timeout");
        } catch (InterruptedException e) {
            logger.warn("ScheduleService waiting for thread to be interrupted{}", e.getMessage());
        }
    }
}

The application process was forced to kill and the lock was not released

Q: Someone kills the process, the server crashes and other extreme situations, the application will die more simply. There is not so much time for a graceful dog strap. What should I do if the lock is not released at this time?

A: Nothing but manual intervention.

You can make a console page and do more things. If you are too lazy to do it, you can write a backdoor or directly modify the database.

schedule-service brief convulsion

When restarting, crashing, network failure, database abnormality and other accidents occur, many business systems cannot communicate with the center and cannot judge whether the task can be executed. It is best not to execute it, and wait patiently or alert.

Registration task failed : application failed to start/cannot perform task, need to wait for service recovery

Failed to request resource : unable to perform task, need to wait for service recovery

Failed to release resources : The service cannot be executed after the service is restored because the lock is not released, and manual intervention is required

Manual intervention is necessary for failure to release resources, and some measures can be used to be lazy.

  1. Put it in the local queue and try again after a while
  2. Put it in a remote queue (such as various MQs), and a dedicated service is responsible for retrying

Disadvantage

  • Strongly rely on schedule-service, if it hangs, the connected application will not be able to start, or the scheduled task cannot be executed after it is started

※ What for?

  • The lock may be lost, for example, the application process is killed, the task being executed is interrupted and the lock will not be released ※ Manual intervention ※ The lock will be automatically released after timeout, reducing the impact
  • The time accuracy is not high, because it is polled every second for filtering, locking, executing, and unlocking, there may be a second-level error ※ The problem is not big

-END-