Wednesday, April 6, 2011

Python Hash Functions

What is a good way of hashing a hierarchy (similar to a file structure) in python?

I could convert the whole hierarchy into a dotted string and then hash that, but is there a better (or more efficient) way of doing this without going back and forth all the time?

An example of a structure I might want to hash is:

a -> b1 -> c -> 1 -> d
a -> b2 -> c -> 2 -> d
a -> c -> 1 -> d
From stackoverflow
  • If you have access to your hierarchy components as a tuple, just hash it - tuples are hashable. You may not gain a lot over conversion to and from a delimited string, but it's a start.

    If this doesn't help, perhaps you could provide more information about how you store the hierarchy/path information.

    Javier : +1. python isn't javascript, dictionary keys can be more than just strings. unfortunately, it's not Lua either, where keys can be _any_ value
  • You can make any object hashable by implementing the __hash__() method

    So you can simply add a suitable __hash__() method to the objects storing your hierarchy, e.g. compute the hash recursively, etc.

  • How do you want to access your hierarchy?

    If you're always going to be checking for a full path, then as suggested, use a tuple: eg:

    >>> d["a","b1","c",1,"d"] = value
    

    However, if you're going to be doing things like "quickly find all the items below "a -> b1", it may make more sense to store it as a nested hashtable (otherwise you must iterate through all items to find those you're intereted in).

    For this, a defaultdict is probably the easiest way to store. For example:

    from collections import defaultdict
    
    def new_dict(): return defaultdict(new_dict)
    d = defaultdict(new_dict)
    
    d["a"]["b1"]["c"][1]["d"] = "test"
    d["a"]["b2"]["c"][2]["d"] = "test2"
    d["a"]["c"][1]["d"] = "test3"
    
    print d["a"]["c"][1]["d"]  # Prints test3
    print d["a"].keys()        # Prints ["c", "b1", "b2"]
    

0 comments:

Post a Comment