Tuesday, March 1, 2011

Search engine logic with pagination, working with large result set

Hi, i've got a website with internal search engine that searches for users with specific parameters like age, height.... There are thousands of users so i've decided to use advanced CTE in order to retrieve 10 records each time to reduce load on MSSQL server.

In this CTE i get results by pages (ie. 10 records per page) while ordering it by lastVisit of user.

Problem is that i can't use default asp.net built it pager control so i've made mine which works fine but i need to get amount of pages in whole result set each time search is generated thus i first get result set and then max number of records so i can divide it by 10 to get latest page number to use in my pager controller.

so each time page loads two db calls are made, while i want to avoid needless amount of records when user just switches a page with same search parameters.

How would you advice to do that ?

From stackoverflow
  • Cache the total number of results, e.g. by assigning some (user-dependent) token and passing it back to server when user is paging? You may also cache the result set itself this way.

    eugeneK : What are the triggers for adding, updating and deleting "max row" cache ? All result set is 50000 records and growing fast, so caching all of it is not an option.
    Dmitry : You may have different strategies depending on how fast does it grow and how big is the probability that new results will be relevant. One option is to use a timeout, when cache expires after some period of user's inactivity. Another option is to calculate a checksum of the newly fetched result set (the first pages) and if it differs from the previous one than something has changed and you want to expire the cache
    eugeneK : What do you mean by calculating checksum for newly fetched result ?
    Dmitry : Well, a fingerprint. A hashcode. Anything which can tell you if the result you just've fetched (you fetch it anyway) differs from the result for which you have a cached value. If it is the same then probably you don't care even if the tail has actually changed. If it is different then you may want to recalculate a cached value of max rows

0 comments:

Post a Comment