Optimized dropout process - Technical Exercise Part 1 Product Engineer Achmad Ardani Prasha #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Benchmark Comparison
Before Optimization
After Optimization

Comparison Table
Summary of Improvements:
Memory Usage: Dropped from 624 MiB to 36 MiB, which is roughly a 94% reduction.
Execution Time: Reduced from 1,323,102 ms (about 22 minutes) to 204,808 ms (about 3.4 minutes), an 84% faster runtime.
Correctness: The number of enrollments processed and dropped out remained the same, ensuring the process logic is preserved.
Optimization Steps
Current Problem: The DropOutEnrollments Artisan command, which processes ~500k enrollments (with 500k related exams and 300k submissions), is extremely slow (1323102 ms ~ 22 minutes) and memory-intensive (624 MiB). This indicates inefficient querying and loading too much data into memory at once.
We can improve execution speed and reduce memory usage by applying several optimizations:
1. Select Only Required Columns
I begin by only selecting the columns I need (id, course_id, and student_id) instead of retrieving full rows. This means my query doesn’t use SELECT * – it explicitly lists required fields. By doing so, I reduce the data transferred from the database and the memory my application uses to hydrate models [1]. It’s a best practice in SQL to only fetch the necessary columns for your task, as this lowers CPU load and memory usage on both the database and application side [2]. In my code, I implement this with Eloquent’s select method:
Enrollment::select('id', 'course_id', 'student_id')
2. Bulk Fetch Related Records with Composite Keys
Next, I needed to determine which enrollments have related exam or submission records without loading all those records entirely. I achieve this by using a composite key (a combination of course_id and student_id) for Exams and Submissions. By selecting a raw concatenation of these two fields and using distinct, I get a unique list of course_id-student_id pairs. This approach minimizes the data pulled into memory – I’m only retrieving a list of keys rather than full objects. Using Eloquent’s pluck on this raw selection gives me a simple array of composite keys without hydrating full models for each record. Avoiding the creation of model instances for each row saves a lot of memory and overhead, as noted in Laravel performance tips [3]. Here’s how I do it in code:
By querying only these composite keys, I dramatically reduce the amount of data loaded, ensuring I work only with the identifiers I need (course–student pairs) instead of entire records.
3. Cache Timestamp per Chunk
When updating records, I use timestamps (for example, setting an updated_at field). Instead of calling the current time (now()) for every single record, I call it once per chunk of records and reuse it. Calling now() is a relatively inexpensive operation, but in a loop over thousands of records it becomes repetitive overhead. Caching the timestamp in a $now variable means I avoid redundant function calls and ensure consistency of the timestamp within that batch. This follows the general optimization principle of moving expensive or repeatable computations outside of loops [4]. In practice, I retrieve the current time at the start of processing a chunk and then use that value for all updates/inserts in that chunk, as shown below:
4. Use Chunking, and Bulk Update & Insert
Finally, I process the enrollments in chunks and perform bulk updates/inserts, rather than handling one record at a time. Chunking the query (e.g., 1000 records at a time) ensures that I never load too many records into memory at once. This keeps memory usage low and makes it feasible to work through millions of records without crashing [5]. Within each chunk, I collect the IDs that need updating and prepare any new records that need inserting. Then I perform a single bulk update for that entire set and a single bulk insert for all new records. Using a set-based update with WHERE IN (...) on the collected IDs lets the database update many rows in one operation, greatly reducing the number of round trips compared to updating each row individually [6]. . Likewise, inserting multiple rows in one query (as opposed to one-by-one inserts) is much faster – the MySQL documentation notes that batching many values into one INSERT can be many times more efficient than single-row inserts [7]. Below is a snippet illustrating this approach:
By processing in chunks, I keep memory usage stable, and by doing bulk database operations, I minimize query overhead. This set-based processing leverages the database’s efficiency at handling multiple records in one go, rather than making thousands of individual calls, thereby dramatically improving performance. Each of these optimizations – selecting minimal columns, fetching keys in bulk, caching timestamps, and using chunked batch updates/inserts – contributes to a more efficient, scalable process.
References