Issue Details (XML | Word | Printable)

Key: BATCH-858
Type: Sub-task Sub-task
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Dave Syer
Reporter: Dave Syer
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Spring Batch
BATCH-679

Pause / resume of Job

Created: 01/Oct/08 05:52 PM   Updated: 30/Oct/08 07:02 AM
Component/s: Core
Affects Version/s: 1.1.0
Fix Version/s: 2.0.0.M3

Time Tracking:
Original Estimate: 2d
Original Estimate - 2d
Remaining Estimate: 0d
Time Spent - 1.5d
Time Spent: 1.5d
Time Spent - 1.5d Time Not Required

File Attachments: 1. Zip Archive mylyn-context.zip (48 kB)

Issue Links:
Depends
 
Related
 


 Description  « Hide
Pause / resume of Job. A job needs to be able to pause and resume, e.g. to wait for operator validation before completion, or to support asynchronous remote execution.

 All   Comments   Work Log   Change History   FishEye   Builds      Sort Order: Ascending order - Click to sort in descending order
Robert Kasanicky added a comment - 21/Oct/08 11:49 AM
There is now working implementation of the pause functionality. I tried to do this in a straightforward manner first and now it's time to look how much design mess it brought and clean it up if necessary.

There are at least two issues that make me wonder:
- the logic concerned with JobExecution creation/continuation is now in two places (JobLauncher and JobRepository)
- pausing and interruption work the same way and AbstractJob distinguishes between the two by looking at JobExection's status, which seems a little fishy.

Dave Syer added a comment - 21/Oct/08 01:02 PM
Looking at JobExecution.status is probably the best way we have to do this - it is a framework enum, and really it was designed for exactly this kind of use case. So I am OK with that part - using BatchStatus as a message to reduce coupling is what it amounts to. ConditionalJob (or similar) will need to consume the same kind of message.

I'll probably form an opinion about the create/continuation logic when I start using it.

Robert Kasanicky added a comment - 22/Oct/08 05:54 AM
If we decide to address the concerns discussed in the comments somehow I suppose those will be worth creating separate issues, so I'm resolving this one.

Robert Kasanicky added a comment - 22/Oct/08 06:07 AM
BATCH-886 is a follow up to address the "the logic concerned with JobExecution creation/continuation is now in two places (JobLauncher and JobRepository)" point from previous comments.

Lucas Ward added a comment - 24/Oct/08 05:10 PM
Robert,

I was doing a review of this today and I have a couple questions for you. In the SimpleJobLauncher, you implemented pause such that, if the status of the last execution is Paused, you just reuse it. This contrasts with how we handle restart, where we create a new JobExecution. I understand what you were trying to do, since it seems like pausing should allow for resuming. However, you lose information. If you wanted to know what time the execution was paused, or started, you don't really know. I suppose you could look at the last step to finish, and the first one to start, but even then if you resumed fairly quickly, there's no guarantee. If you look at how we restart a step, it's the same thing (Unless I'm mistaken), we create a new StepExecution, the only difference is the new one has the ExecutionContext of the old. I think we should remain consistent and create a new JobExecution in the Resume scenario as well.

The second thing is noticed is that you check for Pausing as being analogous to interruption. Meaning, in the checkForInterruption method of SimpleJobRepository, you also check to see if the Execution was set to Pause, and if so, set the StepExecution to terminateOnly, which will cause the job to effectively interrupt. It was my understanding that Pausing meant that a step finished successfully, but the job should end anyway. I thought this was a major distinction between interruption and pause, meaning, pause will wait for the current step to finish and then pause the job, and interruption will stop the step at it's next chunk boundary. Furthermore, because the Job is inspecting a returned Step for a status of PAUSED, and the Step is simply checking for isTerminateOnly and returning as STOPPED, I don't see how it could ever work. In fact, I did a search, and no Step implementation even references BatchStatus.PAUSED. I think the code should be modified to check for pause after a step completes and abending the Job as PAUSED. (also, nulling out the end time in the SimpleLauncher is also a bad idea, IMHO)

Dave Syer added a comment - 25/Oct/08 05:12 AM
Lucas: look at https://springframework.svn.sourceforge.net/svnroot/springframework/spring-batch/trunk/src/site/apt/cases/pause.apt. I see the point about the audit trail, but in that case we need another feature because ew need a PartitionHandler to be able to pause / resume if it wants to without creating hundreds of JobExecutions. Also a step should be able to pause if it wants to (although, as you point out, the main use case that we discussed previously has been a pause after a step).