Why you should be using Spring Batch for batch processing

Giuliana Bezerra
6 min readJan 17, 2020

--

Spring Batch is a framework that emerged from the need of performing batch processing. The statement “data is the new oil” should be familiar to you. The cloud stores a huge amount of data, which grows over time. Therefore, it is important that systems are able to query and store this data in a timely manner without impacting user experience.

Historically, Cobol was the first language that became famous in batch processing. Even today we find Cobol systems being used by banks and public agencies. They are hosted on platforms known as mainframes, which have large storage and processing capacity. For a long time Cobol remained dominant as a technology for batch processing due to its simplicity and efficiency over other solutions at that time, until Accenture brought together several development patterns that gave rise to the first version of Spring Batch. The framework has evolved over time becoming the solution we know today, which includes Spring Boot.

Web systems x Batch systems

Although the need for batch processing is clear, there is not much material addressing this subject, web systems (frontend and backend) are a far more discussed topic. These systems differ greatly from batch systems mainly concerning data manipulation. Data handled by batch systems may not be readily available. Closing a credit card bill is a good example of an application in this scenario. It only happens after the last purchase date.

Another striking difference is data volume. Web systems do not usually process a large amount of data all the time. On the other hand, for batch systems it is common to process a colossal amount of data (millions/billions) that often involves complex business rules. Ecommerce is a good example. The purchase is not usually approved automatically for shipping. There is a deadline for the customer consider his choice, and the credit card payment authorization is not immediate either. Therefore, a system that performs the shipping at a future time is desirable.

Unlike web systems, batches have no user interface so their execution must be triggered by other means (scheduling, manual execution, etc.). Once started, the processing does not require manual intervention, it will terminate when it reaches the scheduled termination setting. Data synchronization routines benefits from this feature as they must run periodically to maintain integration between systems using different databases.

Motivation

Batch systems might be the ideal solution for scenarios such as:

• Extract, Transform, Load (ETL): ETLs are common in integration scenarios. For example, an application may periodically generate files with data that needs to be loaded, transformed, and persisted into another application's database.
• Data Migration: Most companies have legacy systems. When adopting a new system, companies have to migrate legacy data to a database compatible with the new system, this is where Spring Batch shines with its reading, writing, and processing components.
• Parallel Processing: There are scenarios where simply processing a single operation faster is not sufficient due to the large volume of data. The only way to optimize time is by processing more operations in parallel. This requires a robust transactional control and failure recovery mechanisms, which are complex to implement, but luckily Spring Batch has taken care of that.
• Task orchestration: Jobs can be complex, so it is common to find them divided into different applications. To ensure that the macro operation is performed correctly, the applications should be orchestrated, which is a relatively simple task using Spring Cloud Dataflow.
• 24/7 processing: When data flow is constant, if each new data triggers a full processing, the solution would be inefficient. Ideally, you should have a throughput boost mechanism to optimize processing, which is something that Spring Batch also allows you to configure.

Batch: Cobol x Java Spring Batch

This is a controversial issue. Companies that already have their batch systems up and running are resistant to changes. What would motivate a company to rewrite their systems using another technology? Well, here are some advantages of the Java Spring Batch approach over Cobol.

1. Maintainability

Adding new functionality should be a simple task, as companies would not be happy to allocate their web developers to arduous batch maintenance tasks, which are not end-user products. Have you ever had to maintain a Cobol program? If so, you know how painful the process can be. We have to configure a mainframe emulator, ask the code to someone with mainframe access, and it should be all the programs that are part of the funcionality being analized. Worse, there is no automated test. You test some scenarios on an emulator and hope for the best. With Spring Batch is simpler. It runs on the JVM, so any machine is able to execute the code. In addition, the framework simplifies development by providing testing and exception handling tools, logs, everything in the Java and Spring universe.

2. Flexibility

Cobol code only runs on mainframe or emulator, which makes the solution inflexible. With Java, you can run the batch system on any machine with a JVM. In addition, you are able to reuse and share logic between Java systems.

3. Usability

Although batch systems do not have a user interface, their usability applies to the code itself. As systems execute periodically, they need to be designed facilitating monitoring and bug detection, aiming to save the company’s operating resources.

4. Scalability

As mentioned earlier, batch systems often process a large amount of data. As the volume grows, a single process could not be sufficient to meet the time requirement. In this situation, the complex task should be divided to be performed in parallel. The mainframe is fast but has no robust parallelism features. On the other hand, Spring Batch has complex procedures such as transactional control and orchestration of multiple batch instances.

5. Availability

Running a batch system can be costly in terms of computational resources. Therefore, this running should not impact the availability of other systems that consume the processed data. Identifying peak times to fit these executions becomes extremely important, and ensures that processing is done within the available time window. With Spring Batch you can schedule batch execution, stop it if necessary, and restart it without affecting data integrity.

6. Security

Although batch systems are not exposed to hacker attacks like traditional web systems, there are still security requirements that must be met. The main one is the integrity of the data. Is the processed data being validated? If the batch is stopped, does it recover without compromising data integrity? Is sensitive data stored securely? Is access to external systems secure, without exposing sensitive data? All these issues are addressed by Spring Batch, the developer is solely responsible for using the right components and configurations for each scenario.

7. Support

Java and Spring community are huge, so it is easy to ask questions and find Spring Batch applications following the best practices.

8. Cost

A mainframe is expensive, companies know that but they still need them to run their Cobol programs. By contrast, maintaining a Spring Batch job requires cheaper hardware (you only need a computer running Java), the framework is easy to use, and it doesn’t need support or licenses for development.

As you might have noticed, Spring Batch has several advantages and it deals with important topics in batch processing, so developers stay focused on what really concerns them: thinking about the system's business rules.

Conclusion

This post has discussed the Spring Batch framework and why you should be using it in your applications. We have focused on what Spring Batch can do for you, not how. For that, I suggest you to read Spring Batch documentation, which is quite complete and has practical examples of common implementation scenarios.

I also have a Youtube channel where I talk about Spring Batch and software development, be sure to check it out!

References

Michael T. Minella. 2019. The Definitive Guide to Spring Batch: Modern Finite Batch Processing in the Cloud (2nd. ed.). Apress, USA.

--

--

Giuliana Bezerra

Solution Architect — Online Instructor — Youtuber at @giulianabezerra — Writer