« MySQL Pop Quiz #27 | Main | “My turn to play MySQL” »
PHP array_merge is Slow
By Carsten | April 30, 2008
…or I’m doing something stupid, in which case I hope someone would enlighten me.
We grab a number of data from two different MySQL servers, get them back as arrays ($ar1 and $ar2) and then we concatenate the two arrays. $ar1 consists of 30 to 200 elements, sometimes more. $ar2 typically contains 30 elements.
The PHP way of doing this is:
$ar1 = array_merge($ar1, $ar2);
and the home-grown version is
foreach($ar2 as $i) {
$ar1[] = $i; }
While I do realize that “the PHP way” involves creating a new copy of $ar1 along the way, my assumption before testing this was that, being an internal function with no further parsing or interpretation to be done, it would be much faster.
Doing some microtime() estimations while keeping $ar2 constant at 30 elements, I found:
- At 1-10 elements in $ar1, array_merge is about 33% faster.
- At 20-40 elements in $ar1, the execution speed is about the same.
- Above 40 elements in $ar1, execution speed is constant for the home-grown version (no surprise there) and grows progressively worse for array_merge.
That array_merge gets slower with more elements in $ar1 is not surprising. Whereas the home-grown version just adds on to $ar1, array_merge must create a copy of that array along the way. But the speed at which the performance decreases surprised me a great deal: Already at 100 elements, it takes twice as long to complete as the home-grown version. At 200 elements, it takes three times as long.
April 30th, 2008 at 4:27 pm
Interesting, however, your homegrown version isn’t doing everything the built in version is.
From the manual entry for array_merge:
If the input arrays have the same string keys, then the later value for that key will overwrite the previous one. If, however, the arrays contain numeric keys, the later value will not overwrite the original value, but will be appended.
So its really doing something like following:
foreach($ar2 as $key=>$i) {
if( is_numeric($key)) {
$ar1[] = $i;
} else {
$ar1[$key] = $i;
}
}
Its probably that plus the time needed to create another whole array by piece.
April 30th, 2008 at 4:42 pm
Bill, thanks for the comment. I agree that the builtin version is more advanced in many ways. Still, we’re talking compiled C code vs. interpreted PHP, this is why I’m surprised that there’s such a difference with a relatively low number of key/value pairs. Also, in my experiment both arrays are indeed indexed by numbers, not strings.
April 30th, 2008 at 5:19 pm
That is interesting, but not too unexpected.
Depending on whether keys are important or not, you’ll find that:
$ar1 += $ar2;
is actually the fastest!
April 30th, 2008 at 6:26 pm
Joel, while keys aren’t important, they are duplicated between the two arrays (they are numerical). Using += nukes all the duplicated values, which is not feasible in this context.
But += is very fast indeed! Thanks for the pointer.
April 30th, 2008 at 8:36 pm
The compiled c code still has to deal with the php data structures in the same manner php code does. Php arrays are not very space or time efficient, so the more complex things you do with them the worse off you are. It would be interesting to do exactly what array_merge was doing in php and measure the time difference versus the native function call. Php should still be slower, as long as you aren’t using the Zend optimizer.
May 6th, 2008 at 9:44 pm
Carsten,
So you are doing PHP work now? That’s really cool!
Long time no see!
–Joao
June 21st, 2008 at 8:32 pm
If the array, you’re iterating over, is a vector (and not a hash), then foreach is the slow way to do it. The fast way would be:
for ($ii=0,$ll=count($ar2); $ii < $ll; ++$ii) {
$ar1[] = $ar2[$ii];
}
(Also, note ++$ii instead of $ii++, which is a bit faster in PHP).