Android 프로젝트를 하다보면 hasCode() 와 equals()를 Override 하여 사용하는 것을 볼 수 있었다. 이렇게 사용하는 이유를 찾아보니 IBM에서 제공하는 Developer Works 에서 아래와 같은 내용을 찾을 수 있었다.
- ( 출처 : http://www.ibm.com/developerworks/java/library/j-jtp05273/index.html )
Hashing it out
Defining hashCode() and equals() effectively and correctly
While the Java language does not provide direct support for associative arrays -- arrays that can take any object as an index -- the presence of the
hashCode()
method in the rootObject
class clearly anticipates the ubiquitous use ofHashMap
(and its predecessor,Hashtable
). Under ideal conditions, hash-based containers offer both efficient insertion and efficient retrieval; supporting hashing directly in the object model facilitates the development and use of hash-based containers.Defining equality
The
Object
class has two methods for making inferences about an object's identity:equals()
andhashCode()
. In general, if you override one of these methods, you must override both, as there are important relationships between them that must be maintained. In particular, if two objects are equal according to theequals()
method, they must have the samehashCode()
value (although the reverse is not generally true).The semantics of
equals()
for a given class are left to the implementer; defining whatequals()
means for a given class is part of the design work for that class. The default implementation, provided byObject
, is simply reference equality:123public boolean equals(Object obj) {
return (this == obj);
}
Under this default implementation, two references are equal only if they refer to the exact same object. Similarly, the default implementation of
hashCode()
provided byObject
is derived by mapping the memory address of the object to an integer value. Because on some architectures the address space is larger than the range of values forint
, it is possible that two distinct objects could have the samehashCode()
. If you overridehashCode()
, you can still use theSystem.identityHashCode()
method to access this default value.Overriding equals() -- a simple example
An identity-based implementation for
equals()
andhashCode()
is a sensible default, but for some classes, it is desirable to relax the definition of equality somewhat. For example, theInteger
class definesequals()
similarly to this:1234public boolean equals(Object obj) {
return (obj instanceof Integer
&& intValue() == ((Integer) obj).intValue());
}
Under this definition, two
Integer
objects are equal only if they contain the same integer value. This, along withInteger
being immutable, makes it practical to use anInteger
as a key in aHashMap
. This value-based approach to equality is used by all the primitive wrapper classes in the Java class library, such asInteger
,Float
,Character
, andBoolean
, as well asString
(twoString
objects are equal if they contain the same sequence of characters). Because these classes are immutable and implementhashCode()
andequals()
sensibly, they all make good hash keys.Why override equals() and hashCode()?
What would happen if
Integer
did not overrideequals()
andhashCode()
? Nothing, if we never used anInteger
as a key in aHashMap
or other hash-based collection. However, if we were to use such anInteger
object for a key in aHashMap
, we would not be able to reliably retrieve the associated value, unless we used the exact sameInteger
instance in theget()
call as we did in theput()
call. This would require ensuring that we only use a single instance of theInteger
object corresponding to a particular integer value throughout our program. Needless to say, this approach would be inconvenient and error prone.The interface contract for
Object
requires that if two objects are equal according toequals()
, then they must have the samehashCode()
value. Why does our root object class needhashCode()
, when its discriminating ability is entirely subsumed by that ofequals()
? ThehashCode()
method exists purely for efficiency. The Java platform architects anticipated the importance of hash-based collection classes -- such asHashtable
,HashMap
, andHashSet
-- in typical Java applications, and comparing against many objects withequals()
can be computationally expensive. Having every Java object supporthashCode()
allows for efficient storage and retrieval using hash-based collections.Requirements for implementing equals() and hashCode()
There are some restrictions placed on the behavior of
equals()
andhashCode()
, which are enumerated in the documentation forObject
. In particular, theequals()
method must exhibit the following properties:- Symmetry: For two references,
a
andb
,a.equals(b)
if and only ifb.equals(a)
- Reflexivity: For all non-null references,
a.equals(a)
- Transitivity: If
a.equals(b)
andb.equals(c)
, thena.equals(c)
- Consistency with
hashCode()
: Two equal objects must have the samehashCode()
value
The specification for
Object
offers a vague guideline thatequals()
andhashCode()
be consistent -- that their results will be the same for subsequent invocations, provided that "no information used in equals comparison on the object is modified." This sounds sort of like "the result of the calculation shouldn't change, unless it does." This vague statement is generally interpreted to mean that equality and hash value calculations should be a deterministic function of an object's state and nothing else.What should equality mean?
The requirements for
equals()
andhashCode()
imposed by the Object class specification are fairly simple to follow. Deciding whether, and how, to overrideequals()
requires a little more judgment. In the case of simple immutable value classes, such asInteger
(and in fact for nearly all immutable classes), the choice is fairly obvious -- equality should be based on the equality of the underlying object state. In the case ofInteger
, the object's only state is the underlying integer value.For mutable objects, the answer is not always so clear. Should
equals()
andhashCode()
be based on the object's identity (like the default implementation) or the object's state (like Integer and String)? There's no easy answer -- it depends on the intended use of the class. For containers likeList
andMap
, one could have made a reasonable argument either way. Most classes in the Java class library, including container classes, err on the side of providing anequals()
andhashCode()
implementation based on the object state.If an object's
hashCode()
value can change based on its state, then we must be careful when using such objects as keys in hash-based collections to ensure that we don't allow their state to change when they are being used as hash keys. All hash-based collections assume that an object's hash value does not change while it is in use as a key in the collection. If a key's hash code were to change while it was in a collection, some unpredictable and confusing consequences could follow. This is usually not a problem in practice -- it is not common practice to use a mutable object like aList
as a key in aHashMap
.An example of a simple mutable class that defines
equals()
andhashCode()
based on its state isPoint
. TwoPoint
objects are equal if they refer to the same(x, y)
coordinates, and the hash value of aPoint
is derived from the IEEE 754-bit representation of thex
andy
coordinate values.For more complex classes, the behavior of
equals()
andhashCode()
may even be imposed by the specification of a superclass or interface. For example, theList
interface requires that aList
object is equal to another object if and only if the other object is also aList
and they contain the same elements (defined byObject.equals()
on the elements) in the same order. The requirements forhashCode()
are defined with even more specificity -- thehashCode()
value of a list must conform to the following calculation:123456hashCode = 1;
Iterator i = list.iterator();
while (i.hasNext()) {
Object obj = i.next();
hashCode = 31*hashCode + (obj==null ? 0 : obj.hashCode());
}
Not only is the hash value dependent on the contents of the list, but the specific algorithm for combining the hash values of the individual elements is specified as well. (The
String
class specifies a similar algorithm to be used for computing the hash value of aString
.)Writing your own equals() and hashCode() methods
Overriding the default
equals()
method is fairly easy, but overriding an already overriddenequals()
method can be extremely tricky to do without violating either the symmetry or transitivity requirement. When overridingequals()
, you should always include some Javadoc comments onequals()
to help those who might want to extend your class do so correctly.As a simple example, consider the following class:
12345class A {
final B someNonNullField;
C someOtherField;
int someNonStateField;
}
How would we write the
equals()
method for this class? This way is suitable for many situations:12345678910111213public boolean equals(Object other) {
// Not strictly necessary, but often a good optimization
if (this == other)
return true;
if (!(other instanceof A))
return false;
A otherA = (A) other;
return
(someNonNullField.equals(otherA.someNonNullField))
&& ((someOtherField == null)
? otherA.someOtherField == null
: someOtherField.equals(otherA.someOtherField)));
}
Now that we've defined
equals()
, we have to definehashCode()
in a compatible manner. One compatible, but not all that useful, way to definehashCode()
is like this:1public int hashCode() { return 0; }
This approach will yield horrible performance for
HashMap
s with a large number of entries, but it does conform to the specification. A more sensible implementation ofhashCode()
forA
would be like this:1234567public int hashCode() {
int hash = 1;
hash = hash * 31 + someNonNullField.hashCode();
hash = hash * 31
+ (someOtherField == null ? 0 : someOtherField.hashCode());
return hash;
}
Note that both of these implementations delegate a portion of the computation to the
equals()
orhashCode()
method of the state fields of the class. Depending on your class, you may also want to delegate part of the computation to theequals()
orhashCode()
function of the superclass. For primitive fields, there are helper functions in the associated wrapper classes that can help in creating hash values, such asFloat.floatToIntBits
.Writing an
equals()
method is not without pitfalls. In general, it is impractical to cleanly overrideequals()
when extending an instantiable class that itself overridesequals()
, and writing anequals()
method that is intended to be overridden (such as in an abstract class) is done differently than writing anequals()
method for a concrete class. See Effective Java Programming Language Guide, Item 7 for some examples and more details about why this is so.Room for improvement?
Building hashing into the root object class of the Java class library was a very sensible design compromise -- it makes using hash-based containers so much easier and more efficient. However, several criticisms have been made of the approach to and implementation of hashing and equality in the Java class library. The hash-based containers in
java.util
are very convenient and easy to use, but may not be suitable for applications that require very high performance. While most of these will never be changed, it is worthwhile to keep in mind when you're designing applications that rely heavily on the efficiency of hash-based containers. These criticisms include:- Too small a hash range. Using
int
, instead oflong
, for the return type ofhashCode()
increases the possibility of hash collisions. - Bad distribution of hash values. The hash values for short strings and small integers are themselves small integers, and are close to the hash values of other "nearby" objects. A more well-behaved hash function would distribute the hash values more evenly across the hash range.
- No defined hashing operations. While some classes, such as
String
andList
, define a hash algorithm to be used in combining the hash values of its constituent elements into a single hash value, the language specification does not define any approved means of combining the hash values of multiple objects into a new hash value. The trick used byList
,String
, or the example classA
discussed earlier in Writing your own equals() and hashCode() methods are simple, but far from mathematically ideal. Nor does the class library offer convenience implementations of any hashing algorithm that would simplify the creation of more sophisticatedhashCode()
implementations. - Difficulty writing
equals()
when extending an instantiable class that already overridesequals()
. The "obvious" ways to defineequals()
when extending an instantiable class that already overridesequals()
all fail to meet the symmetry or transitivity requirements of theequals()
method. This means that you must understand the structure and implementation details of classes you are extending when overridingequals()
, and may even need to expose private fields in the base class as protected to do so, which violates principles of good object-oriented design.
Summary
By defining
equals()
andhashCode()
consistently, you can improve the usability of your classes as keys in hash-based collections. There are two approaches to defining equality and hash value: identity-based, which is the default provided byObject
, and state-based, which requires overriding bothequals()
andhashCode()
. If an object's hash value can change when its state changes, be sure you don't allow its state to change while it is being used as a hash key.- Symmetry: For two references,