by Jörn Dreyer and Tom Needham
With the release of user_ldap 0.10.0 on Dec 20th 2017, we greatly improved the performance of our LDAP app.
The user_ldap app provides a user backend for ownCloud. It allows to use an existing directory of users and give them access to their data. Because of this, the user_ldap app forms an integral part into any large ownCloud instance that utilizes LDAP for user authentication. Understandably, performance and reliability are crucial here, and small gains can have large impacts.
Performance boost through new oc_accounts table
Since ownCloud X, we have a new database table, oc_accounts. It is populated by user backends such as LDAP, SAML/Shibboleth and also users that were created within ownCloud. To allow searching and autocompletion of users when you need them, the admin can sync these backends ahead of time.
However, with the legacy user_ldap app, this required many extra LDAP queries, meaning a sync for several thousand users could take a matter of hours. When an ownCloud admin would trigger the initial user sync from LDAP to oc_accounts, this took five hours for 100.000 users. Too long.
Overcoming technical debt
When we dug into the code that fetches users, we found an interesting logic: First the code would execute a query to fetch all user ids. The result set already contained things like email, displayname and quota but they were discarded and only the user ids were returned. Afterwards the code would fetch any necessary attributes – one by one – leading to 3-5 queries per user. For 100.000 users that added up to roughly 500.000 individual queries.
In addition, any paged query currently in progress is cancelled because not all LDAP servers support multiple parallel paged queries on the same connection. As a result the initial query for all users is restarted for every page. Compounding this is the fact that the default page size is 500 users, adding another 200 queries to the total. The outcome of this was a high load on some LDAP instances causing delays and frustrations for administrators.
One query to rule them all
To remedy this we refactored the user_ldap app to fetch all necessary attributes in the initial query and cache them in a UserEntry (a small class that holds fetched attributes for a specific user). Any subsequent access to these attributes now uses that cached LDAP result, instead of resetting and running a new query.
The sync fetches 500 users per page and no longer does any additional queries to fetch each attribute. This reduces the number of queries to a single one with multiple result pages. It does not get faster than that.
This works especially well with Active Directory instances. Active Directory can cope way better with large queries than with multiple queries. This way, the new user_ldap app contributes to the stability of Active Directory instances.
Further Work
On top of these performance improvements, we have been working on developing more tools to assist administrators – including single user sync, seen user sync (those that have logged in once), and all this on top of some core sync refactoring to remove code duplication and improve logging output.
Upgrade now!
While the performance requirements by the biggest customers of ownCloud GmbH drove this development, this also benefits the ownCloud community a lot. Both of us have been with ownCloud since the beginning and have taken this time to rework old legacy code to meet our high standards. We will continue to make software that meets enterprise standards – but is open source.
A good reason to upgrade now! Get the new user_ldap release to benefit from the huge performance improvements:
Summary
The refactored user_ldap code enables the fastest LDAP user sync ever – only limited by network-bandwith.