Thursday, April 28, 2011

sybase - showplan drops estimated I/O greatly by adding another table to query

The application of the query has been changed, but the theory is the same.

4 tables:

states - list of states (index on state) stores - stores, and the state they're in (separate indexes on state, store) store_orders - ties an order to a store (separate indexes on store, order) order_statuses - has order and status (separate indexes on order, status)

I want to count how many orders are in each status for a particular state.

One query:

select os.order_status, count(os.order_status) from stores sto, store_order so, order_statuses os where sto.state = 'PA' and sto.store = so.store and so.order = os.order group by os.order_status

in the real application, a showplan here comes back with estimated I/O of around 500,000

But if I do:

select os.order_status, count(os.order_status) from states sta, stores sto, store_order so, order_statuses os where sta.state = 'PA' where sto.state = sta.state and sto.store = so.store and so.order = os.order group by os.order_status

The I/O is estimated at around 2,000, and all I did to change the query was join the states table to the stores table, and specify the exact value in states instead of stores.

Isn't the DBM still going to have to do the same searching of stores whether I specify an explicit value, or I join another table with that explicit value?

I'm trying to figure out why the estimated I/O is so much higher.

I'm no DBA, but trying to learn as much as I can, because I hit things like this all the time, and I'm never quite sure why.

From stackoverflow
  • Lazy Bob, Can you please also turn IO statistics on and then add the results to the question:

    set statistics io on

    showplan by itself will not give that info as you reported in the comments

    also, please post exact working query - ideally, formatted correctly.

    Your query says "where sta.state = 'PA' where sto.state = sta.state " and obviously not something you could have run

    Lazy Bob : Thank you DVK... that was exactly the command I needed. After reviewing the *actual* I/O stats, it turns out, as expected, that adding the extra table actually results in a higher I/O cost.
    DVK : OK, can you please print the exact output of both queries' IO results and query plans to help figure out why IO drops by adding a table? (most likely, there are less scans of some large table)

0 comments:

Post a Comment