Tuesday, April 5, 2011

In SQL, what’s the difference between count(*) and count('x')?

I have the following code:

SELECT <column>, count(*)
FROM <table>
GROUP BY <column> HAVING COUNT(*) > 1;

Is there any difference to the results or performance if I replace the COUNT(*) with COUNT('x')?

(This question is related to a previous one)

From stackoverflow
  • I believe this one has been answered in: http://stackoverflow.com/questions/59294/in-sql-whats-the-difference-between-countcolumn-and-count

    Andrew : That's very similar (and may indeed be the same answer), but I wondered if there is a difference between referencing a specific column (i.e. COUNT(column)) compared to referencing an arbitrary string (i.e. COUNT('x')).
  • The major performance difference is that COUNT(*) can be satisfied by examining the primary key on the table.

    i.e. in the simple case below, the query will return immediately, without needing to examine any rows.

    select count(*) from table
    

    I'm not sure if the query optimizer in SQL Server will do so, but in the example above, if the column you are grouping on has an index the server should be able to satisfy the query without hitting the actual table at all.

    To clarify: this answer refers specifically to SQL Server. I don't know how other DBMS products handle this.

  • This question is slightly different that the other referenced. In the referenced question, it was asked what the difference was when using count(*) and count(SomeColumnName), and SQLMenace's answer was spot on.

    To address this question, essentially there is no difference in the result. Both count(*) and count('x') and say count(1) will return the same number. The difference is that when using " * " just like in a SELECT all columns are returned, then counted. When a constant is used (e.g. 'x' or 1) then a row with one column is returned and then counted. The performance difference would be seen when " * " returns many columns.

    Update: The above statement about performance is probably not quite right as discussed in other answers, but does apply to subselect queries when using EXISTS and NOT EXISTS

    Andrew : Does that mean COUNT('x') would be faster if the table had many columns, compared to COUNT(*)?
    Brannon : I think this behavior depends on the database and the query optimization applied. It's an obvious optimization to perform when you see COUNT(*). It can only mean one thing, you want the total count of rows, regardless of how many columns the table has.
  • To say that SELECT COUNT(*) vs COUNT(1) results in your DBMS returning "columns" is pure bunk. That may have been the case long, long ago but any self-respecting query optimizer will choose some fast method to count the rows in the table - there is NO performance difference between SELECT COUNT(*), COUNT(1), COUNT('this is a silly conversation')

    Moreover, SELECT(1) vs SELECT(*) will NOT have any difference in INDEX usage -- most DBMS will actually optimize SELECT( n ) into SELECT(*) anyway. See the ASK TOM: Oracle has been optimizing SELECT(n) into SELECT(*) for the better part of a decade, if not longer: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1156151916789

    problem is in count(col) to count() conversion **03/23/00 05:46 pm *** one workaround is to set event 10122 to turn off count(col) ->count() optimization. Another work around is to change the count(col) to count(), it means the same, when the col has a NOT NULL constraint. The bug number is 1215372.

    One thing to note - if you are using COUNT(col) (don't!) and col is marked NULL, then it will actually have to count the number of occurrences in the table (either via index scan, histogram, etc. if they exist, or a full table scan otherwise).

    Bottom line: if what you want is the count of rows in a table, use COUNT(*)

    Eric Z Beard : It's not correct to say there's not a difference between select(n) and select(*). If you have a covering index that includes n, you get the data straight from the leaf level of the index and don't have to go back to the table, which is much faster.
    Matt Rogish : The DBMS optimizer *will* realize this, and choose the correct index for the job. Provided there is an index, rare is the day that I've seen a DBMS actually **count** rows on the table. Moreover, the presence of NULLs often cause semantic bugs. When you want the # of rows in a table, use COUNT(*)!!!
    Chris Ammerman : @Matt Just a note on the tone of your answer... If you want to get excited about someone else's apparent ignorance, the appropriate place might be the "comments", rather than your own answer. Lacing your answer with slights at others is most decidedly "not helpful".
    Matt Rogish : TI: I disagree: if someone is incorrect and has upvotes, I find it unlikely that comments will 1) spur upvoters to change their votes or 2) that potential upvoters will read the comments before voting. The "comments (n)" link is too easily overlooked.
    Chris Ammerman : @Matt I was thinking more of using comments to tell the answerer that their answer was poorly informed, so they might fix it, rather than to sway other voters. Furthermore, there's a reason downvoting isn't as impactful as upvoting: to encourage spotlighting good answers over burying bad ones.
    Chris Ammerman : @Matt To put it simply, if your answer is a good one, it will continue to gain upvotes and hence prominence on the page, which will in turn push the bad ones out of prominence, and pinch off any continuing erroneous upvotes they might have gotten otherwise. Harsh language is completely unnecessary.
    Matt Rogish : @TI: I agree that commenting is perfectly suited to nudge answers in the right direction. A wrong, upvoted answer that shows no evidence of any investigation ought to be called out. We want this site to be the arbiter of the "correct" answer. Wouldn't an upvoted, wrong answer be exactly opposite?
    Chris Ammerman : @Matt Surely it would if it somehow managed to get to the top of the page. But if it's already at the bottom, or if there haven't been many answers yet, beating the bad answer into submission seems a prematurely strong reaction, unlikely to be necessary in the end.
    Chris Ammerman : @Matt I guess the crux of my point is that the disparity in score between the good answer and the bad is the real indicator of quality. Not whether the score is positive or negative. And I would prefer, for myself, to use the goodwill of upvotes for good answers to create that disparity. YMMV.
    Matt Rogish : @TI: I understand and can see how you'd feel it's "[unnecessarily] harsh". I don't fully agree (it's a matter of style, I think) but I do agree with your assessment of the situation. I'll avoid pulling that trigger so quickly. Hopefully never! :) Thanks for your feedback.
    Chris Ammerman : @Matt Glad we could settle amicably. That's really my biggest concern. I may have exaggerated the harshness some. I was hit by two things. One was a strong fear that if we aren't diligent in kindness & support with our criticism, this place might become the next slashdot.
    Chris Ammerman : @Matt The other thing that hit me was a concern over the future relevance of remarks on other people's answers, in an environment where those answers can be deleted.
    Matt Rogish : @TI: The next /.? THE HORROR!!! :)
    Matt Rogish : @TI: Indeed, as one of my comments was to a posting that now no longer exists. :(
  • MySQL: According to the MySQL website, COUNT(*) is faster for single table queries when using MyISAM:

    http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_count

    I'm guessing with a having clause with a count in it may change things.

0 comments:

Post a Comment