Thursday, March 24, 2011

Select items by tag when searching multiple tags

Hi, I'm struggling a bit here so I thought why not ask:

Every entity in my system has a list of tags (a list of strings), and I want to be able to search for multiple tags at once.

I have a IQueryable to work with. Every Entity has a IList called Tags and my input parameter is a IList.

I simply could go through all tags and do IQueryable.Where(p => p.Tags.Contains(currentTag), but that would not scale very well with many tags as input, and also I have the feeling that this could be done inside LinQ.

Hope anyone has an Idea.

Edit: Clarification of question: I search for a way to only select Items from my IQueryable that contain ALL supplied parameter tags (of IList).

greetings Daniel / Tigraine

From stackoverflow
  • Not sure I really understand what you're asking, but maybe something like the following would work.

    List<string> searchTags = ...
    
    var query = db.MyEntity
                  .Where( e => e.Tags.Intersect( searchTags ).Count() > 0 );
    

    This should give you the set of entities where the list of tags contains at least one of the items in searchTags

    Tigraine : I just added a clarification to the answer and will try.. I thought about the Intersecct,.. but didn't follow up there.
  • From here, this is some sql that will work for you:

    SELECT entityID
    FROM tags
    WHERE tagID in (...) --taglist
    GROUP BY entityID
    HAVING COUNT(DISTINCT tagID) = ... --tagcount
    

    Now the trick is getting Linq to produce it... Here's some LinqToSql code:

    public List<int> GetEntityIds(List<int> tagIds)
    {
      int tagCount = tagIds.Count;
    
      CustomDataContext myDC = new CustomDataContext();
    
      List<int> entityIds = myDC.Tags
        .Where(t => tagIds.Contains(t.TagId))
        .GroupBy(t => t.entityId)
        .Where(g => g.Select(t => t.TagId).Distinct().Count() == tagCount)
        .Select(g => g.Key)
    
      return entityIds;
    }
    

    A few caveats apply:

    • List(T).Contains is translated by LinqToSql, but LinqToEntities will not translate it. You will instead get a runtime exception.
    • IList.Contains... nobody translates that. Use List(T) instead.
    • There is a parameter count limit in effect for sql server. It's approximately 2000 parameters (higher, but lower than 2500). If you need to use more than 2000 tags, you should seek a different solution.
    • I wrote this without tools, after midnight. It's probably not perfect.

0 comments:

Post a Comment