Spring Data JPA’s standard repositories provide a set of methods that handle common operations used by most persistence layers. That saves us a lot of time and allows us to focus on our business logic. But we also need to ask ourselves which of these methods we want to use. You need to know how each method works internally to answer that question. That’s especially the case if multiple methods have very similar names.
Typical examples of that are the save, saveAndFlush, and saveAll methods. You can use all of them to persist one or more entity objects. But there are small differences between these methods and not all of them are a great fit for most use cases.
In this article, I will show you how those methods work and explain what that means for your persistence layer. You can use Spring Data JPA with various JPA implementations, and everything I show you in this article is independent of a specific implementation. But to make the article a little easier to read and understand, I’m using Hibernate as my JPA implementation.
But before we dive into the details of the 3 different save methods, I want to quickly show you how to find Spring Data JPA’s implementations of its standard repository interfaces. You can use that to check for yourself or to find out how other repository methods work internally.
As you probably know, you can create a repository by defining an interface that extends one of Spring Data JPA’s standard interfaces. In this example, I defined the ChessPlayerJpaRepository to manage my ChessPlayer entity.
public interface ChessPlayerJpaRepository extends JpaRepository<ChessPlayer, Long> { }
My ChessPlayerJpaRepository interface extends Spring Data JPA’s JpaRepository interface. If you check the type hierarchy for that interface in your IDE, you will find the JpaRepositoryImplementation interface. It’s part of Spring Data JPA’s SPI and all implementations of the JpaRepository need to implement it.
That, of course, also includes the standard implementation provided by Spring Data JPA. If you ask your IDE for the implementations of this interface, you find the SimpleJpaRepository class. It implements all methods provided by the standard JpaRepository interface. If you don’t have an IDE open while reading this article, you can find the SimpleJpaRepository class on github.
And when you take a closer look at some of the implemented methods, you will quickly recognize that Spring Data JPA doesn’t do anything too complex. It only uses JPA’s EntityManager to define queries, persist new entities and perform similar operations. But there are some important differences between these methods that you need to know.
Spring Data’s CrudRepository interface defines the save method. It’s a super-interface of the JpaRepository, that is part of the Spring Data parent project and not specific to JPA. That also explains why the method name isn’t a great fit for a persistence layer based on JPA. Unfortunately, this often results in developers misusing this method. I will get into more detail after showing you how the save method works internally.
You find the following code when you check how the SimpleJpaRepository class implements the save methods.
/*
* (non-Javadoc)
* @see org.springframework.data.repository.CrudRepository#save(java.lang.Object)
*/
@Transactional
@Override
public <S extends T> S save(S entity) {
Assert.notNull(entity, "Entity must not be null.");
if (entityInformation.isNew(entity)) {
em.persist(entity);
return entity;
} else {
return em.merge(entity);
}
}
As you can see, the method calls the isNew method to check if the provided entity object is a new one. I explained how that method works in my article about Spring Data JPA’s state detection. In the simplest case, the isNew method checks the version or primary key attribute of the provided entity object. If the attribute is null, the entity object is considered a new entity object that hasn’t been persisted yet.
Based on the result of this check, Spring Data JPA calls the persist method on the EntityManager to insert new entity objects into the database or the merge method to merge an existing entity into the current persistence context.
This is the part where it gets important to understand what Spring Data JPA does internally and how all JPA implementations work.
When you call the persist method on JPA’s EntityManager interface, your entity object changes its lifecycle state from transient to managed. As I explained in my article about JPA’s lifecycle model, this doesn’t enforce the execution of an SQL INSERT statement. It only adds the object as a managed entity to your current persistence context.
Hibernate delays the execution of the INSERT statement until it performs the next flush operation on the persistence context. When Hibernate performs a flush depends on your FlushMode configuration. By default, Hibernate does this before it executes a query or when you commit the transaction.
If you’re new to JPA, the delayed execution of SQL INSERT statements might be confusing. But it’s an efficient and reliable mechanism that enables Hibernate to apply various internal performance optimizations. The way that Spring Data JPA uses this part of the EntityManager is absolutely fine and doesn’t cause any problems.
When you call the save method with an entity object that already exists in the database, Spring Data JPA calls the merge method on the EntityManager. The internal handling of this method depends on the current lifecycle state of the entity object.
Hibernate ignores the call of the EntityManager‘s merge method for all entity objects in lifecycle state managed. These are all entity objects you fetched from the database or persisted during your current Session. Hibernate already manages these entity objects. During the next flush operation, Hibernate will automatically check if any of their attributes have changed. If that’s the case, Hibernate will execute the required SQL UPDATE statements.
For those entity objects, your call of the save method only wastes a few CPU cycles to call the method on the repository and let Hibernate check the lifecycle state of the provided entity object. Unfortunately, you can find this common mistake in many persistence layers using Spring Data JPA.
You only need to call the save method if the lifecycle state of the provided entity object is detached. That’s often the case if you received the entity object from a client or decided to detach the entity from the current persistence context programmatically. In this case, the entity object gets merged into the persistence context.
As I explained in a previous article, Hibernate’s implementation of the merge operation consists of 3 steps:
As I explained in the previous sections, the save method persists a new entity object in the database or merges a detached entity object into the persistence context. These are the only situations in which you should call the save method on your repository.
Unfortunately, during my audit and coaching sessions, I often see calls of the save method after an entity object was changed. This method call shows that the developer wasn’t familiar with JPA’s lifecycle model. When you call the save method with a managed entity object, Spring Data JPA tries to merge an already managed entity object into the persistence context. This doesn’t trigger any SQL UPDATE statements. Hibernate delays them until it executes the next flush operation. Calling the save method only wastes some precious resources to check the entity’s lifecycle state.
Similar to the save method, Spring Data’s CrudRepository also defines the saveAll method. And when you check its implementation in the SimpleJpaRepository class, you quickly recognize that it’s only calling the save method for each element of the provided Iterable.
/*
* (non-Javadoc)
* @see org.springframework.data.jpa.repository.JpaRepository#save(java.lang.Iterable)
*/
@Transactional
@Override
public <S extends T> List<S> saveAll(Iterable<S> entities) {
Assert.notNull(entities, "Entities must not be null!");
List<S> result = new ArrayList<>();
for (S entity : entities) {
result.add(save(entity));
}
return result;
}
Due to that, there’s nothing to add to the things I already explained in the previous section. The only thing I want to point out here is that it doesn’t make any difference if you call the saveAll method with an Iterable of entity objects or if you call the save method for each entity object. JPA’s EntityManager doesn’t provide a persist or merge method that handles multiple entity objects. Due to this, Spring Data JPA has to call these methods multiple times and can’t provide any performance optimizations.
The saveAll method calls the previously discussed save method for each element in the Iterable. So, as explained earlier, you should only call it with new entity objects you want to persist or detached entity objects that you want to merge into the current persistence context.
Calling the saveAll method with an Iterable of already managed entities creates an even bigger overhead than the previously discussed save method. You’re now calling the merge method for multiple managed entity objects. That requires Hibernate to check the current lifecycle state of all the provided entity objects. And if these objects are already managed, they stay in that lifecycle state, and the merge method doesn’t do anything.
From a performance point of view, the saveAndFlush method is the most critical of the 3 discussed save methods. Spring Data JPA’s JpaRepository interface defines it, and it’s the only one specific to Spring Data JPA. As you can see in the following code snippet, it combines the call of the previously discussed save method with a call of the flush method, which calls the flush method on the EntityManager.
/*
* (non-Javadoc)
* @see org.springframework.data.jpa.repository.JpaRepository#saveAndFlush(java.lang.Object)
*/
@Transactional
@Override
public <S extends T> S saveAndFlush(S entity) {
S result = save(entity);
flush();
return result;
}
This small difference is also why the saveAndFlush method isn’t a great choice in most cases. A call of the flush method forces Hibernate to perform a dirty check on all managed entity objects. That are all entity objects you’ve fetched from the database or persisted within the context of your current Hibernate Session.
During the dirty check, Hibernate checks if you changed any attribute of a managed entity since it got fetched from the database or persisted. If that’s the case, Hibernate considers the object dirty, and Hibernate will generate the required SQL statement to persist the change in the database.
Depending on the number of managed entity objects, a dirty check and executing the SQL statements can take some time. That’s why the Hibernate team has put a lot of effort into optimizing the flush operation itself, tries to perform partial flush operations if possible, and only triggers a flush operation if it’s absolutely necessary. That’s usually the case before executing a query or committing the transaction. But not immediately after adding a new entity object to the persistence context.
By calling the saveAndFlush method, you’re forcing Hibernate to flush the entire persistence context and prevent it from using all those optimizations. In addition, you also prevent other optimizations, like grouping identical JDBC statements into a JDBC batch. To make it even worse, all of this happens independently of the outcome of the save method. Suppose you call the saveAndFlush method with an already managed entity object. In that case, you’re still forcing a flush of the persistence context even though the merge method, which was called by the save method, returned immediately after checking the lifecycle state.
The saveAndFlush method calls the save method and forces a flush of the entire persistence context afterward. That prevents several of Hibernate’s performance optimizations and slows down your application. Due to that, you should avoid using the saveAndFlush method and call the save method instead.
One of the very few exceptions, when you might want to call the saveAndFlush method, is a persistence layer that uses FlushMode.MANUAL. You then need to explicitly tell Hibernate when you want to flush the current persistence context. One way to do that is to call the saveAndFlush method when persisting a new entity object. But when you do that, please keep in mind that you don’t need to flush your persistence context after every new entity. You should call the save or the saveAll method for most of your entity objects and only use the saveAndFlush method if you also need to trigger a flush operation.
When using the JpaRepository, you can choose between 3 different save methods.
Spring Data’s CrudRepository defines the save and saveAll methods. The saveAll method calls the save method internally for each of the provided entity objects. Both methods enable you to persist new entity objects or merge detached ones. Please keep in mind that Hibernate automatically persists all changes on managed entity objects. You don’t need to call any method to trigger an update of a managed entity.
Spring Data JPA’s JpaRepository defines the saveAndFlush method. It internally calls the save method and forces a flush of the persistence context afterward. As I explained in this article, the Hibernate team has put a lot of effort into optimizing the flush operation and the timing when this operation gets triggered. By forcing a flush operation after persisting a new entity object, you’re preventing Hibernate from applying these optimizations, and that slows down your application.