Archives

The List

« LOWER() :: MySQL Function of the Day | Main | DATE_FORMAT :: MySQL Function of the Day »

MySQL Pop Quiz #14

By Carsten | March 10, 2008

This quiz is based on an email I got from a somewhat annoyed Jesper Krogh.

btw, I’m still looking for new entries. I get quite a few suggestions, but not all of them make it into quiz questions. Do send in your suggestions!

Given:

SELECT * FROM car_parts:
+---------+---------+-------+--------+
| part_id |  car_id | price | weight |
+---------+---------+-------+--------+
|       1 |       1 |  2.20 |   3.00 |
|       2 |       1 |  3.30 |   5.00 |
|       3 |       1 |  4.40 |   7.00 |
|       4 |       2 |  2.20 |   9.00 |
|       5 |       2 |  3.30 |  11.00 |
|       6 |       2 |  4.40 |  13.00 |
+---------+---------+-------+--------+

We want to know the least expensive part of each car, and how much that part weighs:

SELECT car_id, MIN(price), weight
FROM parts
GROUP BY car_id;
+---------+------------+--------+
|  car_id | MIN(price) | weight |
+---------+------------+--------+
|       1 |       2.20 |   3.00 |
|       2 |       2.20 |   9.00 |
+---------+------------+--------+

Which looks fine. We then do a similar search to find most expensive part of each car, and — again — how much that part weighs:

SELECT car_id, MAX(price), weight
FROM parts
GROUP BY part_of

which gives us:

+---------+------------+--------+
|  car_id | MAX(price) | weight |
+---------+------------+--------+
|       1 |       4.40 |   3.00 |
|       2 |       4.40 |   9.00 |
+---------+------------+--------+

Question 1: Why is the content of the weight column wrong in the second query?

[answer]Answer 1: Actually, the weight column is wrong in both queries, in the sense that it’s completely indeterministic. We just happened to get lucky in the first query. Unlike other RDBMSs, MySQL does not enforce a requirement to have all non-aggregated columns named in the GROUP BY clause. But if you don’t, then you need to ensure that any data on which you’re not grouping is uniquely determined by those columns on which you do group.[/answer]

Question 2: What’s the reason these queries aren’t rejected?

[answer]Answer 2: [Update: Apparently when writing this, I made myself guilty of propagating a myth. Make sure to read the comments for this post, especially the reference to Roland's article]. Unlike other RDBMSs, MySQL allows you to GROUP BY columns that aren’t named in the SELECT clause. This can be handy in situations where you’re really not interested in the column values on which grouping occurs, or where one column value is determined by another (such as a description field being determined by the part_id) and a grouping/sorting step can be avoided. A side effect of this feature is that MySQL must allow you to have non-aggregated columns in the GROUP BY clause as well.

MySQL manual entry on GROUP BY and hidden fields
[/answer]

[popps]

Topics: MySQL, Pop Quiz |

6 Responses to “MySQL Pop Quiz #14”

  1. Francesco Riosa Says:
    March 10th, 2008 at 12:52 pm

    Using subqueries is a way to solve this specific problem:

    SELECT p1.car_id,
    MIN(p1.price) AS minPrice,
    (SELECT weight
    FROM parts AS p2
    WHERE (p1.part_id, p1.car_id, p1.price) = (p2.part_id, p2.car_id, p2.price)
    ORDER BY p2.price ASC, p2.weight ASC
    LIMIT 1
    ) AS weightOfMinPrice
    FROM parts AS p1
    GROUP BY p1.car_id;

    Luckily MySQL permit use of LIMIT in subqueries too (but test it on your DB version it was buggy once).

    The query show the lower weight product that has the minimum prize, it’s possible to achieve the four combination of min/max between price and weight changing the order by from ASC to DESC.

  2. Jay Pipes Says:
    March 10th, 2008 at 3:35 pm

    Good reading from Roland on this subject (dispelling the myth that the SQL standard *requires* GROUP BY to list all non-aggregate/grouped fields used in the SELECT):

    http://rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html

  3. Roland Bouman Says:
    March 10th, 2008 at 4:36 pm

    Hi!

    I believe answer #2 is wrong:

    “Unlike other RDBMSs, MySQL allows you to GROUP BY columns that aren’t named in the SELECT clause.”

    There is no such restriction in any RDBMS that I know of. It is the other way around as described in the manual section you refer to:

    “MySQL extends the use of GROUP BY so that you can use non-aggregated columns or calculations in the SELECT list that do not appear in the GROUP BY clause.”

    kind regards,
    Roland

  4. Bill Karwin Says:
    March 10th, 2008 at 8:13 pm

    Yes, an ambiguity occurs when an expression in the select-list has multiple values per distinct value in the column(s) named in the GROUP BY clause. Roland’s explanation is excellent.

    The database I have the most experience with besides MySQL is InterBase (aka Firebird). InterBase’s implementation of SQL is based on SQL-92, and it actually throws a SQL error if you try to run a query of the form “SELECT a, b FROM t GROUP BY a”.

    The problem is that most databases aren’t optimized to have a functional dependency map on the tip of their tongue at all times. So it’s hard for the SQL engine to know if a grouped query will result in an ambiguous result set or not. That’s probably for the best; it would be expensive to track this meta-information, and it would be useful infrequently.

    In any case, whether it results in an error or not, there is such a thing as the “Single Value Rule” in SQL-99. Requiring that any non-aggregate column must be named in the GROUP BY clause is a simple guideline, and it does prevent ambiguous results, but certainly it’s more subtle than that.

  5. Carsten Says:
    March 10th, 2008 at 9:34 pm

    Thanks for the comments. As you can see, I’ve updated answer #2 to point people to read those.

  6. Guerrero de Terracota » MySQL Pop Quiz #14 Says:
    March 31st, 2008 at 9:10 am

    [...] ???????????? ? ?????????? Carsten Pedersen. ???????? [...]

Comments