Thursday, March 31, 2011

Long query prevents inserts

I have a query that runs each night on a table with a bunch of records (200,000+). This application simply iterates over the results (using a DbDataReader in a C# app if that's relevant) and processes each one. The processing is done outside of the database altogether. During the time that the application is iterating over the results I am unable to insert any records into the table that I am querying for. The insert statements just hang and eventually timeout. The inserts are done in completely separate applications.

Does SQL Server lock the table down while a query is being done? This seems like an overly aggressive locking policy. I could understand how there could be a conflict between the query and newly inserted records, but I would be perfectly ok if records inserted after the query started were simply not included in the results.

Any ways to avoid this?

Update:
The WITH (NOLOCK) definitely did the trick. As some of you pointed out, this isn't the cleanest approach. I can't really query everything into memory given the amount of records and some of the columns in this table are binary (some records are actually about 1MB of total data).

The other suggestion, was to query for batches of records at a time. This isn't a bad idea either, but it does bring up a new issue: database independent queries. Right now the application can work with a variety of different databases (Oracle, MySQL, Access, etc). Each database has their own way of limiting the rows returned in a query. But maybe this is better saved for another question?

Back on topic, the "WITH (NOLOCK)" clause is certainly SQL Server specific, is there any way to keep this out of my query (and thus preventing it from working with other databases)? Maybe I could somehow specify a parameter on the DbCommand object? Or can I specify the locking policy at the database level? That is, change some properties in SQL Server itself that will prevent the table from locking like this by default?

From stackoverflow
  • It depends what Isolation Level you are using. You might try doing your selects using the With (NoLock) hint, that will prevent the read locks, but will also mean the data being read might change before the selecting transaction completes.

  • If you add the WITH (NOLOCK) hint after a table name in the FROM clause it should make sure it doesn't lock, and it doesn't care about reading data that is locked. You might get "out of date" results if you are writing at the same time, but if you don't care about that then you should be fine.

  • I reckon your best way of avoiding this is to do it in SQL rather than in the application.

    You can add a

    WAITFOR DELAY '000:00:01'

    at the end of each loop iteration to provide time for other processes to run - just make sure that you haven't initiated a TRANSACTION such that all other processes are locked out anyway

  • The first thing you could do is try to add the "WITH (NOLOCK)" to any tables you have in your query. This will "Tame down" the locking that SQL Server does. An example of using "NOLOCK" on a join is as follows...

    SELECT COUNT(Users.UserID)
        FROM Users WITH (NOLOCK)
           JOIN UsersInUserGroups WITH (NOLOCK) ON 
              Users.UserID = UsersInUserGroups.UserID
    

    Another option is to use a dataset instead of a datareader. A datareader is a "fire hose" technique that stays connected to the tables while your program is processing and basically handling the table row by row through the hose. A dataset uses a "disconnected" methodology where all the data is loaded into memory and then the connection is closed. Your program can then loop the data in memory without having to worry about locking. However, if this is a really large amount of data, there maybe memory issues.

    Hope this helps.

  • The query is performing a table lock, thus the inserts are failing.

    It sounds to me like you're keeping a lock on the table while processing the results. You should instead load them into an array or collection of some sort, and close the database connection. Then process the array.

    In addition, while you're doing your select use either: WITH(NOLOCK) or WITH(READPAST)

  • I'm not a big fan of using lock hints as you could end up with dirty reads or other weirdness. A couple of other ideas:

    • Can you break the number of rows down so you don't grab 200k at a time? Is there a way to tell whether you've processed a row - a flag, a timestamp - you could use to make the query? Your query could be 'SELECT TOP 5000 ...' getting a differnet 5k each time. Shorter queries mean shorter-lived locks.

    • If you can use smaller sets of rows I like the DataSet vs. IDataReader idea. You will be loading data into memory and not consuming any SQL locks, but the amount of memory can cause other problems.

    -Brian

  • You should be able to set the isolation level at the .NET level so that you don't have to include the WITH (NOLOCK) hint.

    If you want to go with the batching option, you should be able to specify the Rowcount setting from the .NET level which would tell the database to only return n number of records. By setting these settings at the .NET level they should become database independent and work across all the platforms.

  • If you're using SQL Server 2005+, then how about giving the new MVCC snapshot isolation a try. I've had good results with it:

    ALTER DATABASE  SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    ALTER DATABASE  SET READ_COMMITTED_SNAPSHOT ON;
    ALTER DATABASE  SET MULTI_USER;
    

    It will stop readers blocking writers and vice-versa. It eliminates many deadlocks, at very little cost.

    Sam Saffron : +1 much better suggestion that with nolock

0 comments:

Post a Comment