Maximum Subarray
Problem statement:
Find the largest sum that is possible in an array. I.e., find the maximum sub-array.
Solution:
From the very first this problem looks complicated because of how it is framed. It sounds like we want to find the MAXIMUM from ALL THE POSSIBLE sums. To find the maximum means comparison among all the elements, which is O(n) at the least. To find all the possible sums from all the elements – that must be O(n2). So immediately, in our mind, we are thinking “how to decrease its complexity” ergo it is already complex in our mind.
Instead, let us focus on simply solving the problem like a layperson.
We have an array:
1 |
a[0], a[1], a[2].... a[i]... a[n] |
We iterate over it…
… 1st element is a[0]. Okay. There aren’t any other elements so it is the maximum so far.
… 2nd element is a[1]. Now we have two elements – a[0] and a[1]. How do we figure out the maximum possible sum between these two? Let’s see:
- Either a[0] + a[1] is actually the maximum sum
- Or a[0] is the maximum sum. Which means a[1] must be negative?
- Or a[1] is the maximum sum. Which means a[0] must be negative?
Actually, who cares if a[0] or a[1] is positive or negative. Either a[1] is part of the already running sum, or it is ending it. If it part of the already running sum, then it will be simply added to it and we can proceed to the next element a[2] (if it is present). If it is ending it, then we can try to check if it starts the new maximum subarray or not. a[1] becomes the new a[0], so to speak. The maximum subarray will be whichever one is bigger between a[0] and a[1].
Now we get to the element a[2]. Before we decide if it is part of running sum or not, let us make it clear we already have a running sum. It was either started by a[0] and a[1] contributed to it, or it was started by a[1] and we have yet to decide which of the a[0] and a[1] is actually the maximum. Either way, there is already a running sum when we encounter a[2].
Using the same logic as before, either a[2] contributes to it, or a[2] displaces it. If it contributes to it, we simply add a[2] and proceed to the next element a[3] (if it is present). If it displaces it then it is either displacing the one that was started by a[0] or one started by a[1].
If we had already compared a[0] and a[1] in the previous step and stored the result, we would know who a[2] is displacing. Since we are only interested in maximum sub-array, we only need to keep track of who we are displacing and not where these displaced entries are ranked.
But how do we decide if a[2] is actually displacing the previous running sum or just adding to it? The only way it will not be added to the previous sum is when it not only decreases the running sum, it decreases it by so much that we would rather start from a[2]. Or,
1 |
running sum + a[2] < a[2] |
running sum is negative.
Just so it is clear, let’s see what happens when we encounter a[3]:
- We are tracking who started the maximum running sum.
- Either a[3] is added to it.
- Or it displaces it, in which case we compare it with the previous running sum and store the result to track who started the maximum running sum.
- Go to step 1.
Let’s write some code to check it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
#include <vector> #include <iostream> using namespace std; void FindMaximumSubArray(const vector& arr) { if (arr.size() == 0) // Must have at least 1 element { return; } size_t start = 0; // First maximum sub-array starts at first element size_t end = start; // and ends at first element too int running_sum = arr[start]; // and its sum is what is in the first element. for (size_t i = 1; i < arr.size(); ++i) { int new_sum = running_sum + arr[i]; if (new_sum < arr[i]) // If new sum is less that previous { // then new element is displacing running_sum = arr[i]; // the previous running sum start = i; end = start; } else { // Otherwise we just add to the previous running sum running_sum = new_sum; end = i; } } cout << running_sum << " from " << arr[start] << "/" << start << " to " << arr[end] << "/" << end << endl; } |
No programming without testing!
1 2 3 4 5 6 |
Input (a): 1 2 3 -2 5 Output (a): 9 from 1/0 to 5/4 // Look's good! Input (b): -1 -2 -3 -4 Output (b): -4 from -4/3 to -4/3 // Eh?? Input (c): -2 -3 4 -1 -2 1 5 -3 Output (c): 4 from 4/2 to -3/7 // Whaaat?? |
The answers for (b) should obviously be -1 (only the first element), and for (c) it should be 7 (starting from 4 to 5). Hmm… we made the mistake of assuming the new element is going to start the new maximum sub-array. It is possible that new element just ends it and is not part of the new maximum sub-array! The running sum is not necessarily the maximum! We must track maximum subarray:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
#include <vector> #include <iostream> using namespace std; void FindMaximumSubArray(const vector& arr) { if (arr.size() == 0) // Must have at least 1 element { return; } size_t start = 0; // First maximum sub-array starts at first element size_t end = start; // and ends at first element too int running_sum = arr[start]; // and its sum is what is in the first element. size_t max_start = start; size_t max_end = end; int max_sum = running_sum; for (size_t i = 1; i < arr.size(); ++i) { int new_sum = running_sum + arr[i]; if (new_sum < arr[i]) // If new sum is less that previous sum { // then new element is displacing running_sum = arr[i]; // the previous running sum start = i; end = start; } else { // Otherwise we just add to the previous running sum running_sum = new_sum; end = i; } if (max_sum < running_sum) // If running_sum is bigger than maximum sum { // then update the maximum sum with it max_sum = running_sum; max_start = start; max_end = end; } } cout << max_sum << " from " << arr[max_start] << "/" << max_start << " to " << arr[max_end] << "/" << max_end << endl; } |
Testing:
1 2 3 4 5 6 |
Input (a): 1 2 3 -2 5 Output (a): 9 from 1/0 to 5/4 // Look's good! Input (b): -1 -2 -3 -4 Output (b): -1 from -1/0 to -1/0 // Look's good! Input (c): -2 -3 4 -1 -2 1 5 -3 Output (c): 7 from 4/2 to 5/6 // Look's good! |
Look’s good! Complexity is O(n).