I want to hold a bunch of const char pointers into an std::set container [1]. std::set template requires a comparator functor, and the standard C++ library offers std::less, but its implementation is based on comparing the two keys directly, which is not standard for pointers.
I know I can define my own functor and implement the operator() by casting the pointers to integers and comparing them, but is there a cleaner, 'standard' way of doing it?
Please do not suggest creating std::strings - it is a waste of time and space. The strings are static, so they can be compared for (in)equality based on their address.
1: The pointers are to static strings, so there is no problem with their lifetimes - they won't go away.
-
Depending on how big a "bunch" is, I would be inclined to store a corresponding bunch of
std::string
s in the set. That way you won't have to write any extra glue code.From Greg Hewgill -
The "optimized way"
If we ignore the "premature optimization is the root of all evil", the standard way is to add a comparator, which is easy to write:
struct MyCharComparator { bool operator()(const char * A, const char * B) const { return (strcmp(A, B) < 0) ; } } ;
To use with a:
std::set<const char *, MyCharComparator>
The standard way
Use a:
std::set<std::string>
It will work even if you put a static const char * inside (because std::string, unlike const char *, is comparable by its contents).
Of course, if you need to extract the data, you'll have to extract the data through std::string.c_str(). In the other hand, , but as it is a set, I guess you only want to know if "AAA" is in the set, not extract the value "AAA" of "AAA".
Note: I did read about "Please do not suggest creating std::strings", but then, you asked the "standard" way...
The "never do it" way
I noted the following comment after my answer:
Please do not suggest creating std::strings - it is a waste of time and space. The strings are static, so they can be compared for (in)equality based on their address.
This smells of C (use of the deprecated "static" keyword, probable premature optimization used for std::string bashing, and string comparison through their addresses).
Anyway, you don't want to to compare your strings through their address. Because I guess the last thing you want is to have a set containing:
{ "AAA", "AAA", "AAA" }
Of course, if you only use the same global variables to contain the string, this is another story.
In this case, I suggest:
std::set<const char *>
Of course, it won't work if you compare strings with the same contents but different variables/addresses.
And, of course, it won't work with static const char * strings if those strings are defined in a header.
But this is another story.
xtofl : That's right. I just would make the operator const. Hey, didn't I just type that?paercebal : :-D ... you're right !fizzer : "of course it won't work with static const char* strings if those strings are defined in a header." Why not? What's magic about headers?paercebal : If a *static* string (say "A") is defined in a header, if that header is included in three different sources, and if you add in the set "A" once, in each source, then you'll have potentially 3 different "A", with different addresses, which I guess, was not the point of putting the string in the headflorin : Potentially, although modern linkers usually optimize identical constant strings (and the most aggressive one even substrings) to share the same storage.paercebal : You're right, but this optimization depends on: 1 - You use the optimization, which in debug, usually, you won't (ie the code above won't work on debug). 2 - It is a built-in type, because objects have constructors/destructors. All in all, it is an optimization, not a feature we should rely upon.paercebal : The conclusion is that we can't rely on potential optimization to have the compiler merge 3 similar strings defined in 3 different compilation units. And using static on a string defined on a header included in three sources is exactly that. This is a reason the "static" is a bad idea in this case.paercebal : (of course, it was tested on g++ 4.2.4)From paercebal -
Must the set contain
const char*
?What immediately springs to mind is storing the strings in a
std::string
instead, and putting those into thestd::set
. This will allow comparisons without a problem, and you can always get the rawconst char*
with a simple function call:const char* data = theString.c_str();
From Matt Dillard -
The standard way of doing it, if you don't want to wrap them in
std::string
s, is to define a functor class:struct ConstCharStarComparator { bool operator()(const char *s1, const char *s2) const { return strcmp(s1, s2) < 0; } } typedef std::set<onst char *, ConstCharStarComparator> stringset_t; stringset_t myStringSet;
xtofl : That's right. I just would make the operator const.Adam Rosenfield : Whoops, thanks, fixed now.From Adam Rosenfield -
Presumably you don't want to use std::string because of performance reasons.
I'm running MSVC and gcc, and they both seem to not mind this:
bool foo = "blah" < "grar";
EDIT: However, the behaviour in this case is unspecified. See comments...
They also don't complain about
std::set<const char*>
.If you're using a compiler that does complain, I would probably go ahead with your suggested functor that casts the pointers to
int
s.Edit: Hey, I got voted down... Despite being one of the few people here that most directly answered his question. I'm new to Stack Overflow, is there any way to defend yourself if this happens? That being said, I'll try to right here:
The question is not looking for
std::string
solutions. Every time you enter anstd::string
in to the set, it will need to copy the entire string (until C++0x is standard, anyway). Also, every time you do a set look-up, it will need to do multiple string compares.Storing the pointers in the set, however, incurs NO string copy (you're just copying the pointer around) and every comparison is a simple integer comparison on the addresses, not a string compare.
The question stated that storing the pointers to the strings was fine, I see no reason why we should all immediately assume that this statement was an error. If you know what you're doing, then there are considerable performance gains to using a
const char*
over eitherstd::string
or a custom comparison that callsstrcmp
. Yes, it's less safe, and more prone to error, but these are common trade-offs for performance, and since the question never stated the application, I think we should assume that he's already considered the pros and cons and decided in favor of performance.fizzer : Don't sweat the downvote - it happens all the time. Confident but clueless posts tend to get upvoted and accepted. I guess it's still a beta & will sort itself in time. I think it would be polite if the downvoters commented, though.fizzer : FWIW - your '<' example is unspecified behaviour in C++ (undefined in C), but the set is OK. I didn't vote you down though.Andrew Top : Thanks for the comments. Bummer, but oh well... SO's evolution should be interesting. How about that unspecified behaviour. I guess I should warn about that. Thanks.Mark Ransom : I didn't give the downvote, but it was well deserved. Your solution fails to sort the input by anything relevant, and doesn't work for two identical strings that are at different addresses. Just because it compiles doesn't mean it works.Andrew Top : "The strings are static, so they can be compared for (in)equality based on their address". I understand that this may be a bad idea, but I was answering his question, not giving my opinion about best practices. Also, the concept of a set doesn't exist to sort data, it exists for membership testing.From Andrew Top -
Either use a comparator, or use a wrapper type to be contained in the set. (Note:
std::string
is a wrapper, too....)const char* a("a"); const char* b("b"); struct CWrap { const char* p; bool operator<(const CWrap& other) const{ return strcmp( p, other.p ) < 0; } CWrap( const char* p ): p(p){} }; std::set<CWrap> myset; myset.insert(a); myset.insert(b);
From xtofl -
Others have already posted plenty of solutions showing how to do lexical comparisons with
const char*
, so I won't bother.Please do not suggest creating std::strings - it is a waste of time and space.
If
std::string
is a waste of time and space, thenstd::set
might be a waste of time and space as well. Each element in astd::set
is allocated separately from the free store. Depending on how your program uses sets, this may hurt performance more thanstd::set
's O(log n) lookups help performance. You may get better results using another data structure, such as a sortedstd::vector
, or a statically allocated array that is sorted at compile time, depending on the intended lifetime of the set.the standard C++ library offers std::less, but its implementation is based on comparing the two keys directly, which is not standard for pointers.
The strings are static, so they can be compared for (in)equality based on their address.
That depends on what the pointers point to. If all of the keys are allocated from the same array, then using
operator<
to compare pointers is not undefined behavior.Example of an array containing separate static strings:
static const char keys[] = "apple\0banana\0cantaloupe";
If you create a
std::set<const char*>
and fill it with pointers that point into that array, their ordering will be well-defined.If, however, the strings are all separate string literals, comparing their addresses will most likely involve undefined behavior. Whether or not it works depends on your compiler/linker implementation, how you use it, and your expectations.
If your compiler/linker supports string pooling and has it enabled, duplicate string literals should have the same address, but are they guaranteed to in all cases? Is it safe to rely on linker optimizations for correct functionality?
If you only use the string literals in one translation unit, the set ordering may be based on the order that the strings are first used, but if you change another translation unit to use one of the same string literals, the set ordering may change.
I know I can define my own functor and implement the operator() by casting the pointers to integers and comparing them
Casting the pointers to
uintptr_t
would seem to have no benefit over using pointer comparisons. The result is the same either way: implementation-specific.From bk1e -
Just go ahead and use the default ordering which is less<>. The Standard guarantees that less will work even for pointers to different objects:
"For templates greater, less, greater_equal, and less_equal, the specializations for any pointer type yield a total order, even if the built-in operators <, >, <=, >= do not."
The guarantee is there exactly for things like your
set<const char*>
.Richard Corden : I think this shows why *everybody* should have a copy of the standard (or a good draft) which they can check. First answer in 6 that, IMHO, answers the question fully (+1)florin : This is the correct answer (even though it does not solve my problem, since I have to use an older compiler which lacks the required specialization). I tried to downvote the wrong answers, but the wisdom of the crowds beat me...From fizzer
0 comments:
Post a Comment