Archives

The List

« MySQL Certification on Cramsession.com | Main | Books Worth Reading »

For a given value of “natively”…

By Carsten | August 18, 2005

Mark Leith writes:

We had a question in #mysql on freenode yesterday, asking if there was a function to strip digits from a string. The answer is of course - not natively.

A statement which depends rather heavily on your definition of “natively”, I’d say. Mark goes on to provide a stored procedure called strip_digits, which does this for you. “It even performs fairly well!”.

(At this point, you should read Marks post — otherwise, what I’m about to say will make very little sense)

Mark is no fool, as he often proves on #mysql, so I’ll gladly take any advice he will offer. It’s just that whenever I see such a couple of unqualified statements, I can’t keep myself from doing some testing…

For a given value of “natively”

It’s true that there is no single MySQL function that will magically strip the digits from your string. You can, however, do this by combining native mysql functions (one function, actually).

To strip the digits from the string ‘a1b2c3d4e5f6g7h8i9j’, all you need to do is to embed a bunch of REPLACE commands within one another:

mysql> SELECT
    -> REPLACE(
    ->  REPLACE(
    ->   REPLACE(
    ->    REPLACE(
    ->     REPLACE(
    ->      REPLACE(
    ->       REPLACE(
    ->        REPLACE(
    ->         REPLACE(
    ->          REPLACE('a1b2c3d4e5f6g7h8i9j', '0', ''),
    ->          '1', ''),
    ->         '2', ''),
    ->        '3', ''),
    ->       '4', ''),
    ->      '5', ''),
    ->     '6', ''),
    ->    '7', ''),
    ->   '8', ''),
    ->  '9', '') AS result;
+------------+
| result     |
+------------+
| abcdefghij |
+------------+
1 row in set (0.00 sec)

The innermost REPLACE removes all zeroes, then passes the result to the next level which removes all ones, etc.

Certainly, saying “SELECT strip_digits(’a1b2c3d4e5f6g7h8i9j’)//” is more aesthetically pleasing. But at what cost?

Performance

As stated above, I’m inclined to believe pretty much anything Mark will say on the subject of MySQL. Still, I had some hesitations when seeing “It even performs fairly well!”

A quick (but somewhat simplistic) way to test performance of different methodologies is to turn off the query cache and use the BENCHMARK function. BENCHMARK loops over an expression a given number of times, and always returns 0, which is in itself quite uninteresting. But we can then look at the execution time to see how long things took.

So, without further ado, here’s the result of running each function 1 mio times…

mysql> SELECT BENCHMARK(1000000, strip_digits('a1b2c3d4e5f6g7h8i9j'))//
+---------------------------------------------------------+
| benchmark(1000000, strip_digits('a1b2c3d4e5f6g7h8i9j')) |
+---------------------------------------------------------+
|                                                       0 |
+---------------------------------------------------------+
1 row in set (11 min 56.34 sec)

mysql> SELECT BENCHMARK(1000000, REPLACE(REPLACE(REPLACE(REPLACE(
    -> REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
    -> REPLACE('a1b2c3d4e5f6g7h8i9j','0',''),'1',''),'2',''),
    -> '3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9','')) AS result;
+--------+
| result |
+--------+
|      0 |
+--------+
1 row in set (2.96 sec)

Ouch.

It looks as if using the native functions is some 242 times faster than using the stored procedure solution suggested.*

Oh, and the stored procedure (as presented) will only accept strings of lenghts less than 50… The solution given above has no such limits.

Aesthetics

Comparing a “native” query to a stored procedure is not entirely fair — any stored procedure will incur an overhead. So let’s see what happens when we combine the beauty of a stored procedure and the native functionality (accepting that we now limit ourselves to fairly short strings):

mysql> # define new strip_digits:
mysql> DELIMITER //
mysql> DROP FUNCTION strip_digits2 //
Query OK, 0 rows affected (0.12 sec)

mysql> CREATE FUNCTION strip_digits2 (str VARCHAR(50))
    -> RETURNS VARCHAR(50)
    -> BEGIN
    ->
    -> RETURN REPLACE(
    ->         REPLACE(
    ->          REPLACE(
    ->           REPLACE(
    ->            REPLACE(
    ->             REPLACE(
    ->              REPLACE(
    ->               REPLACE(
    ->                REPLACE(
    ->                 REPLACE(str, '0', ''),
    ->                 '1', ''),
    ->                '2', ''),
    ->               '3', ''),
    ->              '4', ''),
    ->             '5', ''),
    ->            '6', ''),
    ->           '7', ''),
    ->          '8', ''),
    ->         '9', '');
    -> END;
    -> //
Query OK, 0 rows affected (0.01 sec)

mysql> DELIMITER ;
mysql> # see that it works:
mysql> SELECT strip_digits2('a1b2c3d4e5f6g7h8i9j');
+--------------------------------------+
| strip_digits2('a1b2c3d4e5f6g7h8i9j') |
+--------------------------------------+
| abcdefghij                           |
+--------------------------------------+
1 row in set (0.00 sec)

mysql> # ...and do the benchmark:
mysql> SELECT BENCHMARK(1000000, strip_digits2('a1b2c3d4e5f6g7h8i9j'));
+----------------------------------------------------------+
| BENCHMARK(1000000, strip_digits2('a1b2c3d4e5f6g7h8i9j')) |
+----------------------------------------------------------+
|                                                        0 |
+----------------------------------------------------------+
1 row in set (21.80 sec)

Adding the overhead of the stored procedure call in this case makes things run some 8 times slower. Depending on your particular setup, this may be acceptable, or it may not — still, it’s some 32 times faster than the original stored procedures - solution.

*The benchmark run for Mark’s version was done only once; for the REPLACE version it was run several times. 2.96 sec was the slowest of these.

Topics: MySQL |

2 Responses to “For a given value of “natively”…”

  1. Mark Leith » Blog Archive » Stripping Digits - The benefits of Benchmarking and Profiling Says:
    August 18th, 2005 at 5:26 pm

    [...] anager of at MySQL AB) read my entry about stripping digits, and has done a good write up here on how this function could be improved, a great deal. Have a read of Carstens link, and my previou [...]

  2. BENCHMARK() :: MySQL Function of the Day | Carsten’s Random Ramblings Says:
    April 16th, 2008 at 10:09 am

    [...] You can see a real-world example where BENCHMARK() was used to compare two approaches to string manipulation here. [...]

Comments