The key foundation of any batch system is its processing strategy. It is very important for anyone to understand what they want from their batch system and then adopt a strategy to achieve their goal. You should understand that NOT every batch system needs to be complex or distributed in nature. Not every meta store needs to be a Hadoop. If your underlying store is a relational database and you have needs to perform batch operations, you may consider using spring batch as the technology of choice rather than spark, for example.
There are multiple factors that affect the selection of a processing strategy. They include (but are not limited to):
- Estimated batch data volume
- Concurrent execution alongside other batch or online/realtime ...