Laravel liming chunk collection

  • Little Intro about chunk :

Laravel comes up with an excellent Eloquent ORM. There are lot of cool things in Eloquent, one of which is chunk method.

Usually when you need to process large data, lets say you want to update users table and assign a coupon code based on 3rd party APIs.

What you can do is :

User::get()->each(function($user){

    $coupon = API::getCouponCode($user->email);
    $user->coupon = $coupon;
    $user->save();
});

Doing this is good but if you have thousands of users, loading them all up at once to do the coupon saving process. That's going to take consume a hige memory at a time and maybe the server will be exusted because so much data is stored in memory for processing at a time.

The chunk method helps these kind of implementations. Chunking records means taking a batch of records by a limit, process those, take the next batch processing those and so on... The idea is you are taking a subset of data to process at a time instead of entire data loaded into memory.

The chunk method will retrieve a "chunk" of Eloquent models, feeding them to a given Closure for processing. Using the chunk method will conserve memory when working with large result sets.

Lets do the same but with chunk this time :

$recodsPerBatch = 50;

User::chunk($recodsPerBatch, function($user){

    $coupon = API::getCouponCode($user->email);
    $user->coupon = $coupon;
    $user->save();
});

Now, as you can probably guess this will take 50 user records at a time to process, once those are completed it will take next 50 untill all records are procesed by chunk closure.

  • The main problem Limit with Chunk :

Let's apply limit to the above example :

$recodsPerBatch = 50;

User::limit(100)->chunk($recodsPerBatch, function($user){

    $coupon = API::getCouponCode($user->email);
    $user->coupon = $coupon;
    $user->save();
});

If you would think laravel will chunk 50 records in 2 batches are we are limiting the total records to 100, oops not really.

Laravel will do series of queries like below to chunk the records :

select * from `users` limit 50 offset 0
select * from `users` limit 50 offset 50
select * from `users` limit 50 offset 100
...

The chunk method ignores any previous eloquent method which applies limit. It can be limit and offset OR take and skip...

This becomes a problem for which lot of people had raised an issue on laravel's github repo. But Laravel's core dev team mentioned this is the expected behaviour and rightfully so.. Chunk itself is using limit to convert entire collection in batches.

  • And.... Here it is.. The Solution you were waiting for :

Laravel has a chunk variation called chunkById. Let's use the first example and implement chunkById now :

$recodsPerBatch = 50;

User::chunkById($recodsPerBatch, function($user){

    $coupon = API::getCouponCode($user->email);
    $user->coupon = $coupon;
    $user->save();
});

The main and only fundamental difference between chunk and chunkById is how it structures the query.

select * from `users` where `id` > 0 order by `id` asc limit 50
select * from `users` where `id` > 0 and `id` > 50 order by `id` asc limit 50
select * from `users` where `id` > 0 and `id` > 50 and `id` > 100 order by `id` asc limit 50
select * from `users` where `id` > 0 and `id` > 50 and `id` > 100 and `id` > 150 order by `id` asc limit 50
...

If you observer in the queries done by chunkById :

  1. It's adding an order by clause to the id column (By the way you can specify the column as 3rd argument to chunkById method if itis not id)
  2. It adds where id > x each time it processes the next batch
  3. There is no offset used and id column is used conceptually as an offset.

This gives you an advantage that you can add your offset in the query using id column as a limit just like chunkById is doing internallu.

Let's limit the chunk in our example with 100 records to chunk :

$recodsPerBatch = 50;

$limit = 100;

$maxId = User::orderBy('id', 'asc')->offset($limit)->limit(1)->select('id')->first()->id;

User::where('id', '<', $maxId)->chunkById($recodsPerBatch, function($user){

    $coupon = API::getCouponCode($user->email);
    $user->coupon = $coupon;
    $user->save();
});

What we did was, even if we can not use limit directly, we got an id which falls just above the limit we want. Used that to add a where clause of where('id', '<', $maxId).

This will then chunk the 100 results, with 2 batches of 50 records in each batch. Cool, isn't it!

 
 
By : Mihir Bhende Categories : laravel, php Tags : laravel, php, chunk, limit, eloquent