|
Actually there seems to be no reason any longer that reader/writers need to be stateful - they can store their state in the ExecutionContext (an exception might be a multi-threaded state, where maybe transaction resources need to be used in conjunction with ExecutionContext). Then we have a completely different situation - everything is stateless, and anyone who wants access to some state registers for a callback, e.g. with a marker interface.
The plan would be to push all state from readers/writers into the execution context and provide registration for *Listeners (which replace and supersede *Aware). Not sure if that will work (I wish it would). The only way it works is if the readers/writers hang on to a reference to the context, or if *all* methods in their interface take the context as an argument. That seems invasive, at first glance anyway.
I split up config for TaskExecutorLauncher a bit. Heading for a more rational approach based on a new application context per run. One way to do it would be a new implementation of JobLocator that loads an ApplicationContext instead of just looking in a map.
I'm really not sure that the ItemReaders and Writers can be stateless, I think they're stateful by definition. I know we *could* put all the state in the ExecutionContext, but we've tried this sort of approach before, and it just doesn't work.
In batch, our readers have to be streaming over a large dataset, and I just don't see any advantage in trying to put that state in some other object. If you're going to do that, you might as well just put the whole reader and writer in state object I guess I'm saying, you can't use the same readers and writers by multiple threads generally. Let's give an example: JdbcCursorItemReader. If you open the cursor then put the resultSet in state, then try and go at it with another thread that has a different context, you may have issues opening another very large cursor up over the same dataset. It's even worse for a file, since the OS is going to lock it, and as we've said before, if you're using different JobParameters (such as a different file) it's a different JobInstance. I think we're much better off maintaining that only one thread should be touching a reader or writer at a time, rather than trying to bypass this with stack scoped state. (In reply to comment #6)
> I disagree that readers/writers *have* to be single-threaded in their contract. > I agree that they are fundamentally stateful. Some clever stuff would have to > be done to make them multi-threaded, but it isn't impossible. I'm in agreement here. I think there is some confusion about what's thread-safe versus what's stateless. It's more than possible to create a stateful thread-safe object with proper synchronization. I think that the contract of the ItemReader/Writers is that they are thread-safe only (although I don't think that InputStream implies this so this might not even be part of the contract, you might just have implementations that are thread-safe for use in multi-threaded environments). I think the real question of this issue lies in scoping. I don't think that an ItemReader/Writer instance can be used outside of a single step scope. Because they are generally stateful (in that they maintain internal markers/caches/buffers/etc) and also that they can't be reset to 0 easily (to re-read the file for example) I think that they've got to be instantiated in a way that scopes them on the stack. This does imply factories for some items (like the reader/writer) but following the WebFlow example, it's possible to create stateless objects that persist some information in contexts for smaller elements. That's all so far so good. I really believe that demanding factories for ItemRead/Writer just isn't going to work, though. These are interfaces, not classes, and anyone (by design) can implement the interface and have it used appropriately by a batch job. I really don't think we should force people to write factories for their own implementations. That leaves us with a problem, which up to now I have been inclined to solve with scope="step". It seems more and more like that is not a good solution for users either.
I want to remove step scope if I can, and for 1.0 recommend that users simply create a new ApplicationContext for each job execution. It shouldn't be too hard to do, and all the other benefits (almost) of step scope can be obtained from the new BatchListener and ItemStream interfaces. What we do in 2.0 is another question, but surely it must come down to some sort of smart object factory. SWF is a good place to look for ideas (I like the flow variables, but hate the way you have to configure them). Step scope is gone. All sample jobs can be run from command line or JMX since configuration was split into job-only and launcher-only. TaskExecutorLauncher works with an ApplicationContext per job execution via the new JobFactory strategy.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
BATCH-361) within the current framework. Each step would need to be scope="job" to ensure strict thread safety - it would be optional for anyone using a throwaway application context for every job execution.An alternative is to scrap custom scopes and insist that users create a new ApplicationContext for each job execution. It makes StepContextAware more complicated, but it's do-able. There needs to be a registration protocol for *Aware instances, like direct injection into the Job/Step,or autodetection in the ApplicationContext. Those interfaces might then disappear in favour of *Listener (see
BATCH-151,BATCH-378).In the future we would like to have a more domain-focused way of specifying these concerns - it might take inspiration from the SWF concept of a flow variable (we'd have job and step variables). The drawback of that approach is that SWF (although it continues to improve) has a clunky model for setting things up, from a user's perspective - a flow definition file plus a beans definition file for every flow. I don't think we should burden users with that, in the long run.