Our world is constantly evolving and transforming at an unprecedented rate, with new data being generated every second. It’s hard to keep up! The N+1 query problem refers to the issue in database retrieval when each related entity of an object is queried separately from a database, resulting in O(n) queries, where n represents the number of related entities. However, having a broad range of solutions to draw from can be incredibly useful in navigating this complex landscape.
That’s why this article is here – to offer up some real-world examples and innovative ideas for tackling the “N+1 problem.” Whether you’re a seasoned professional or just starting out, we hope these insights will prove valuable in your quest to stay ahead of the curve.
what’s in the article
- What is the N+1 query problem?
- How to avoid your code from incorporating such performance anti-patterns?
- Process of Solving N+1 Issue
- What caused the N+1 query problem?
- Conclusion
What is the N+1 query problem?
The N+1 query problem is a common issue that arises in software development when retrieving data from a database. Essentially, it occurs when a program retrieves a collection of objects and then makes an additional query for each object to fetch related data.
This can quickly lead to a performance bottleneck, as the database is queried multiple times for the same information, causing unnecessary latency and decreasing overall efficiency. When large amounts of data are involved, the N+1 problem becomes even more pronounced, potentially leading to long load times or even system crashes.
Let us give you some examples of the N+1 query problem in action:
Let’s say you’re building an e-commerce website with a product catalog. Each product has a category associated with it, and you want to display the category name along with the product name on the product page. Without careful optimization, your code might retrieve the list of products and then make a separate query for each product to retrieve its category name. This would result in N+1 queries, where N is the number of products being displayed. As the number of products grows, this could significantly slow down your website’s performance.
Another example might be a social media platform that displays a list of users and their posts. If each post has associated comments, and your code retrieves the list of posts and then makes a separate query for each post’s comments, you could end up with the same N+1 problem.
How to avoid your code from incorporating such performance anti-patterns?
As a software developer, it’s important to be aware of performance anti-patterns like the N+1 query and take steps to avoid them in your code. By following these strategies, you can optimize your code and avoid performance anti-patterns that can slow down your application and harm the user experience.
Here are some strategies you can use to optimize your code and prevent such anti-patterns from slowing down your application.
Use eager loading
One of the most effective ways to avoid the N+1 query problem is to use eager loading. Eager loading retrieves all the necessary data in a single query, minimizing the number of round trips to the database. This can significantly improve the performance of your application, especially when working with large datasets.
Implement caching
Caching can also be a powerful tool for improving performance and avoiding anti-patterns. By storing frequently accessed data in memory or on disk, you can reduce the need to make repeated requests to the database. This can speed up your application and reduce the risk of performance problems.
Optimize database queries
When working with databases, it’s important to write efficient queries that retrieve only the needed data. Avoid selecting unnecessary columns or rows, and use indexing to speed up queries whenever possible.
Monitor performance
Finally, it’s crucial to monitor the performance of your application regularly and identify any potential performance issues early on. This can help you catch and fix anti-patterns like the N+1 problem before they can impact your users.
Process of Solving N+1 Issue
We’d like to explain and describe the process of solving the N+1 issue based on our experience. We were tasked with maintaining and developing new functionality for a project that utilized various technologies, including Symfony 4, Doctrine ORM, and a MySQL database. The overarching objective of the project was to streamline the tax calculation process and make life easier for users.
The N 1 select problem
Initially, it seemed like the task would be a breeze – there were only a handful of users and some basic data to work with, including data from various APIs. All we had to do was gather the data and crunch some numbers to arrive at the desired result.
To illustrate the task at hand, let’s consider an example scenario.
There is a User model:
- ID
- name
- password
Also, there is a Subscription model:
- ID
- plan
- user_id
Plus, there is a Payment model:
- ID
- amount
- date
- user_id
And there is a Tax model:
- ID
- parameter 1
- parameter 2
- user_id
One user could have many subscriptions, payments, taxes, etc. So there are One to Many relations. The final purpose was to calculate the final info for the admin using users’ data from payment API and the database. Let’s assume we need a Result model for the admin:
- etc.
- calculated parameter 1
- calculated parameter 2
Here are the methods we used to solve the N+1 issue
Lazy loading
We started off by utilizing the lazy loading approach. It was a simple setup, supported by Doctrine ORM, and was ideal for situations where we only needed a subset of data and its relations from the database. However, we soon encountered a problem – the response time was too long. Upon closer inspection, we realized that there were too many requests being sent to the database. This was when the N+1 issue reared its head.
Eager loading
Next, we decided to try the eager loading approach to solve the N+1 issue. Eager loading fetches all the necessary data simultaneously, making it an efficient solution. However, it didn’t work in our case due to a lack of available RAM. We encountered an “Out of Memory” error, and while it was possible to increase the memory_limit value in the php.ini file, this wasn’t the best solution as memory resources are limited.
Split getting data
In some cases, it may be feasible to split data retrieval from a database, especially if you need to return paginated data from a custom API. However, this didn’t apply to our situation as we needed to calculate data for all the users without any data omissions.
Use a database like temporary storage
To resolve the N+1 issue, we came up with a solution that involved using a database table as temporary storage during the final result calculation. Here are the steps we followed:
- We created a Result model.
- Then we created a table to store the data for the aforementioned model.
- We fetched all the data via eager loading.
- We looped through the users’ data, but instead of storing the results in RAM, we saved them to the results table in the database.
- Finally, we retrieved all the calculated data from the database in just one request.
While this solution worked, it had the potential to cause an issue with too many requests being sent to the database.
Calculate the result on the database level
Another approach we considered was to shift the task of gathering all the data and calculating the result from the application level to the database level. This could be done using stored procedures and triggers in the following manner:
- We created a Result model.
- Then created a table to store the data for the Result model.
- We set up stored procedures and triggers to handle changes in the users, payments, subscriptions, or taxes table and recalculate the data stored in the Result table accordingly.
- When the admin wants to retrieve Result data, there is no need for on-the-fly calculation since we already have the result. All that’s needed is to fetch the data from the results table and render it.
By using this approach, we can improve performance and reduce the risk of N+1 queries being executed.
What caused the N+1 query problem?
The N+1 query can arise when working with an object-relational mapping (ORM) tool, such as Doctrine ORM, to access data from a relational database. It occurs when the ORM executes N+1 database queries to retrieve related data, where N is the number of objects returned in the initial query. This can happen due to the ORM’s lazy loading behavior, where it loads related data only when accessed and not eagerly at the time of the initial query.
Conclusion
In conclusion, the N+1 query problem can significantly affect the performance of database retrieval by generating unnecessary queries and slowing down the application. However, by utilizing techniques such as eager loading, lazy loading, splitting data, and using a cache, developers can effectively solve this problem and improve the performance of their applications.
It is crucial for developers to be aware of the N+1 query problem and to implement the appropriate solutions to optimize the efficiency of their database queries. By doing so, they can ensure that their application performs well and provides a smooth user experience for their users.
We hope that the solutions mentioned above can assist programmers in tackling some of the common tasks they encounter on a daily basis.
About author
Roman Bondarenko is the CEO of EVNE Developers. He is an expert in software development and technological entrepreneurship and has 10+years of experience in digital transformation consulting in Healthcare, FinTech, Supply Chain and Logistics.
Author | CEO EVNE Developers