Mix.install([
{:youtube, github: "brooklinjazz/youtube"},
{:hidden_cell, github: "brooklinjazz/hidden_cell"},
{:tested_cell, github: "brooklinjazz/tested_cell"},
{:benchee, "~> 1.1"},
{:utils, path: "#{__DIR__}/../utils"}
])
Ensure you type the ea
keyboard shortcut to evaluate all Elixir cells before starting. Alternatively, you can evaluate the Elixir cells as you read.
Recursion is often associated with divide and conquer algorithms.
Divide and conquer algorithms recursively split the problem into many similar sub-problems.
Let's take a popular divide and conquer sorting algorithm called merge sort to help understand.
Merge sort allows us to sort a list into ascending order, so [4, 3, 2]
becomes [2, 3, 4]
MergeSort.sort([4, 3, 2])
[2, 3, 4]
Now, how might you implement a sorting function from scratch?
Well, let's start simple, and work our way up.
How could you sort an empty list? By doing nothing! An empty list is already sorted, the same goes for a list with one element.
[]
[1]
Now, let's step up the challenge. How would you sort a list with two elements?
You could compare each value and build a new list like so.
[a, b] = [2, 1]
if a <= b, do: [a, b], else: [b, a]
How might we sort a list with four elements?
Well, we know how to sort a list with 2 elements. So what if we split the four-element list
[4, 2, 3, 1]
into two lists?
flowchart
a["[4, 3, 2, 1]"] --> b["[4, 3]"]
a --> c["[2, 1]"]
We can use Enum.split/2 with the midpoint of the list (2
) to do that.
Enum.split([4, 3, 2, 1], 2)
Then we already know how to sort those lists.
flowchart
b["[4, 3]"] --> d["[3, 4]"]
c["[2, 1]"] --> e["[1, 2]"]
{[a, b], [c, d]} = Enum.split([4, 3, 2, 1], 2)
list_a = if a <= b, do: [a, b], else: [b, a]
list_b = if c <= d, do: [c, d], else: [d, a]
{list_a, list_b}
Now here's a problem we don't know the answer to. How do we merge two sorted lists?
Anytime we encounter a problem we don't know the solution to, it can be wise to consider the simplest case. Sometimes this is called the base case.
So the simplest lists to merge would be two empty lists, or a list with one element and an empty list.
We'll start organizing our code in a MergeSort
module.
MergeSort.merge([], [])
[]
MergeSort.merge([1], [])
[1]
MergeSort.merge([], [2])
[2]
In code that might be
defmodule MergeSort do
def merge([], []), do: []
def merge([], list_b), do: list_b
def merge(list_a, []), do: list_a
end
MergeSort.merge([1], [])
We want to sort two single element lists.
MergeSort.merge([2], [1])
[1, 2]
Things get trickier, but we can reuse the knowledge of how to sort a list with two elements. We've already sorted one two-element list, so separate the head of each list, and compare them.
flowchart TB
A --> B
B --> C
subgraph A[step 0]
direction LR
list_a["[a]"] ---
list_b["[b]"]
end
subgraph B[step 1]
a --- c1[compare] --- b
end
subgraph C[step 2]
c2[compare] --> g[greater than] --> f["[b, a]"]
c2[compare] --> l[less than or equal] --> t["[a, b]"]
end
We return [head_a, head_b]
When the head of list_a is less than or equal to the head of list_b.
Otherwise we return [head_b, head_a]
def merge([head_a | _], [head_b | _]) when head_a <= head_b do
[head_a, head_b]
end
def merge([head_a | _], [head_b | _]) do
[head_b, head_a]
end
Here's all of that put together into a MergeSort
module.
defmodule MergeSort do
def merge([], []), do: []
def merge([], list_b), do: list_b
def merge(list_a, []), do: list_a
def merge([head_a | _], [head_b | _]) when head_a <= head_b do
[head_a, head_b]
end
def merge([head_a | _], [head_b | _]) do
[head_b, head_a]
end
end
MergeSort.merge([2], [1])
Here's the fun part, where we involve recursion. Likely this is the most challenging leap. How do we go from sorting two single element lists to multi-element lists?
MergeSort.merge([3, 4], [1, 2])
[1, 2, 3, 4]
Our merge/2
function currently gets us part the way there, it compares the heads of the two
sorted lists and sorts them.
MergeSort.merge([3, 4], [1, 2])
So, what if we took the sorted head, then attempted to merge the remaining elements in the two lists? You can imagine that visualized step by step as:
[a | _] = MergeSort.merge([3, 4], [1, 2])
[b | _] = MergeSort.merge([3, 4], [2])
[c | _] = MergeSort.merge([3, 4], [])
[d | _] = MergeSort.merge([4], [])
[a, b, c, d]
To actually implement this, instead of returning [head_a, head_b]
or [head_b, head_a]
, we
instead, recursively merge the remaining elements.
def merge([head_a | tail_a], list_b) when head_a <= hd(list_b) do
[head_a | merge(tail_a, list_b)]
end
def merge(list_a, [head_b | tail_b]) do
[head_b | merge(list_a, tail_b)]
end
Put together, here's our new MergeSort
module. It can now merge two pre-sorted lists.
defmodule MergeSort do
def merge([], []), do: []
def merge([], list_b), do: list_b
def merge(list_a, []), do: list_a
def merge([head_a | tail_a], list_b) when head_a <= hd(list_b) do
[head_a | merge(tail_a, list_b)]
end
def merge(list_a, [head_b | tail_b]) do
[head_b | merge(list_a, tail_b)]
end
end
MergeSort.merge([3, 4], [1, 2])
Now that we know how to merge two sorted lists of any size, we're ready to put everything together
to create the MergeSort.sort/1
function.
MergeSort.sort([4, 2, 3, 1])
[1, 2, 3, 4]
In a divide and conquer algorithm we typically split the collection in half recursively
To split any list in half, we can use the Enum.split/2 function with the midpoint of the list rounded down. To find the midpoint of the list, we can use Enum.count/1 function to find the length and then divide the count by 2. We use integer division to ensure the count is rounded down.
def halve(list) do
length = Enum.count(list)
mid_point = div(length, 2)
Enum.split(list, mid_point)
end
During the sort, we will recursively half the list to create a branching tree of sublists (ensuring that we stop when the list has 1 or 0 elements)
flowchart
A["[4, 2, 3, 1]"]
B1["[4, 2]"]
B2["[3, 1]"]
C1["[4]"]
C2["[2]"]
C3["[3]"]
C4["[1]"]
A --> B1
A --> B2
B1 --> C1
B1 --> C2
B2 --> C3
B2 --> C4
defmodule MergeSort do
def halve(list) do
length = Enum.count(list)
mid_point = div(length, 2)
Enum.split(list, mid_point)
end
def sort([head | []]), do: [head]
def sort(list) do
{list_a, list_b} = halve(list)
left_sorted = sort(list_a)
right_sorted = sort(list_b)
[left_sorted, right_sorted]
end
end
MergeSort.sort([4, 2, 3, 1])
Once we have these sublists, we can then merge them together.
flowchart
A["[1, 2, 3, 4]"]
B1["[2, 4]"]
B2["[1, 3]"]
C1["[4]"]
C2["[2]"]
C3["[3]"]
C4["[1]"]
subgraph 1[left sorted]
C1
end
subgraph 2[right sorted]
C2
end
subgraph 3[left sorted]
C3
end
subgraph 4[right sorted]
C4
end
subgraph 5[left sorted]
B1
end
subgraph 6[right sorted]
B2
end
M1[merge]
M2[merge]
M3[merge]
1 --- M1
2 --- M1
M1 --- 5
3 --- M2
4 --- M2
M2 --- 6
5 --- M3
6 --- M3
M3 --- A
def sort([head | []]), do: [head]
def sort(list) do
{list_a, list_b} = halve(list)
left_sorted = sort(list_a)
right_sorted = sort(list_b)
merge(left_sorted, right_sorted)
end
All put together, we've finished our MergeSort
!
defmodule MergeSort do
def merge([], []), do: []
def merge([], list_b), do: list_b
def merge(list_a, []), do: list_a
def merge([head_a | tail_a], list_b) when head_a <= hd(list_b) do
[head_a | merge(tail_a, list_b)]
end
def merge(list_a, [head_b | tail_b]) do
[head_b | merge(list_a, tail_b)]
end
def halve(list) do
length = Enum.count(list)
mid_point = div(length, 2)
Enum.split(list, mid_point)
end
def sort([head | []]), do: [head]
def sort(list) do
{list_a, list_b} = halve(list)
left_sorted = sort(list_a)
right_sorted = sort(list_b)
merge(left_sorted, right_sorted)
end
end
MergeSort.sort([4, 2, 3, 1])
Now truthfully, you won't often need to write your own sorting algorithms, however, it's useful to be aware of divide and conquer as a problem-solving tool.
Most of the time, you'll find built-in solutions or libraries for common problems.
In fact, we must (shamefully) admit that the MergeSort
solution has worse performance than
the built-in Enum.sort
module.
Benchee.run(
%{
"enum" => fn list -> Enum.sort(list) end,
"merge_sort" => fn list -> MergeSort.sort(list) end
},
inputs: %{
"small" => 1..1000 |> Enum.to_list() |> Enum.shuffle(),
"medium" => 1..10_000 |> Enum.to_list() |> Enum.shuffle()
}
)
In the Elixir cell below see if you can improve the performance of the MergeSort
in this new
ModifiedMergeSort
module. Ensure its behavior remains the same.
Hint
You might consider removing the assignment of values to variables. These intermediate steps may improve readability, but not performance.defmodule ModifiedMergeSort do
def merge([], []), do: []
def merge([], list_b), do: list_b
def merge(list_a, []), do: list_a
def merge([head_a | tail_a], list_b) when head_a <= hd(list_b) do
[head_a | merge(tail_a, list_b)]
end
def merge(list_a, [head_b | tail_b]) do
[head_b | merge(list_a, tail_b)]
end
def halve(list) do
length = Enum.count(list)
mid_point = div(length, 2)
Enum.split(list, mid_point)
end
def sort([head | []]), do: [head]
def sort(list) do
{list_a, list_b} = halve(list)
left_sorted = sort(list_a)
right_sorted = sort(list_b)
merge(left_sorted, right_sorted)
end
end
Once you have refactored the above code, you can use this Benchmark to verify your solution is faster or slower.
Benchee.run(
%{
"original" => fn list -> MergeSort.sort(list) end,
"modified" => fn list -> ModifiedMergeSort.sort(list) end
},
inputs: %{
"small" => 1..1000 |> Enum.to_list() |> Enum.shuffle(),
"medium" => 1..10_000 |> Enum.to_list() |> Enum.shuffle(),
"large" => 1..100_000 |> Enum.to_list() |> Enum.shuffle()
}
)
Run the following in your command line from the beta_curriculum folder to track and save your progress in a Git commit.
$ git add .
$ git commit -m "finish deprecated divide and conquer section"