what are hashcodes ?
Latest posts by Prasad Kharkar (see all)
- PyCharm for Machine Learning - July 17, 2018
- Linear Discriminant Analysis using Python - April 30, 2018
- Principal Component Analysis using Python - April 30, 2018
What is hashcode exactly? Wikipedia defines then as follows.
A hash function is an algorithm that maps data of variable length to data of fixed length. The values returned by hash functions are called hash values, hash codes, hash sums, check sums or simply hashes.
Java hashcode() method does the same, but there are some other points to be considered while studying hashcodes.
- Hashcodes are typically used to enhance the performance of large collections of data.
- Collections such as HashSet and HashMap use the hashcode of an object to determine how exactly an object should be stored in the collection.
- When an object is to be searched from collections that use hashcodes, object’s hashcode is calculated first and with the help of it, object is retrieved.
Let us first understand what exactly are hashcodes. Consider the scenario in which we want to store multiple strings in a collection, say we are storing it in HashSet.
- We will number all the letters from 1 to 26 i.e a = 1, b = 2, c = 3 and so on till z = 26.
- Our hash function is such that it returns the sum of the value of string
e.g. Say we have three names, dave, matt and vade. We will calculate the hashcodes of these two names using our hash function.
For dave, d=4, a = 1, v = 22 and e = 5 so dave = 4 + 1 + 22 + 5 = 32.
For matt, m = 13, a = 1, t = 20 and t = 20 so matt = 13 + 1 + 20 + 20 = 54.
For vade, v =22, a = 1, d = 4 and e = 5 so vade = 22 + 1 + 4 + 5 = 32.
Now consider another name is added to our set, this time same one e.g. dave again. Now there are two equals names in our collection. For this newly added,
For dave, d=4, a = 1, v = 22 and e = 5 so dave = 4 + 1 + 22 + 5 = 32.
Simply consider these hashcodes are groups and these names are group members. A group member is selected for the group based on some criteria, in this case, it is the hashcode number. There are two hashcode numbers here, i.e. 32 and 54 so we have two groups. Three members vade, dave and dave will go to the 32 group and matt will go to 54 group.
Now when we want to find an object in the collection, say we want to find matt. So search for the object will be based on hashcode. Simply hashcode is calculated, for matt it is 54 so all the collection objects need not be compared at all because their hashcodes are different. matt will be retrieved directly.
Now when we want to search for dave, hashcode will be calculated, i.e. 32. Now there are total 3 names with the same hashcode, now the names need to pass equality test for it. dave is compared with vade, but no match because even if the hashcode is same, the value is different. Again, dave is compared with dave. Oh yes, the name is same now and this will return true for object equality.
- From this we can conclude that retrieval and storing of objects is efficient when hashcodes are used.
In the next tutorial, we will look into collections, equals and hashcode implementations
Awesome explanation, that even a layman can understand it so easily.
Hi Prasad, as HashSet is the collection being used and since set does not allow duplicates, the second “dave” will not be added. Right?
Hi Prasad, you are right. In a HashSet, duplicates are not allowed and hence two “dave” values cannot be there. I know this article is causing a bit of confusion 🙂
Only thing you should know is that, hashcode values are calculated based on some algorithm which categorises objects into buckets, i.e. hashcodes so that if an object is being searched for equality, then java doesn’t have to compare all objects. Using “HashSet” word is a bit wrong in above article 🙂
This article needs to only focus on how hashcodes are calculated and how they work. I will correct the article soon.
Visit below link to learn more about significance of equals and hashcode.
http://www.thejavageek.com/2013/06/28/significance-of-equals-and-hashcode/