MD5 Hash in Python

MD5 Hash in Python

What are hash functions?

These are functions that can take data of any size as input but give a fixed size ciphered output called message digest or hash. This value can be used to verify the integrity of data. The MD5 hash is part of the cryptographic Hash functions.

When data is transmitted over the internet, a hash value is also added to the data packet. When the server receives this data, it calculates its hash and checks it with the attached hash value. If it is the same, only then the data packet is accepted otherwise it means that there has been a change in the data. This change could have been due to data loss, noise error, or data tampering by a third party. Thus, the hash value is used to verify the integrity of the data.

Hash functions always give an output of a fixed length, it does not depend on the size of input data. Thus, large data can be mapped to a fixed-size output. Hence, hash functions are also known as compression functions.

Applications of hash

Hash functions are used in the message digest, digital signature, a data structure called hashmap, password verification, and other cryptographic applications.

MD5 hash function

MD5 hash function is commonly used to verify the integrity of data. It was found to have cryptographic vulnerabilities, so it is not used in cryptographic applications. It can be safely used for detecting changes in data or comparing files.

MD5 converts data into 128 bits. Even if the file size is gigabytes, the output will always be 128 bits. Change in even one bit of data results in a completely different hash value. This is called as avalanche effect.

Python MD5 Hash Implementation

Python MD5 Hash Function

Python has hashlib library that contains different hash functions including MD5 and different SHA variants. We will be using this library to perform hashing.

Code:

# importing the library
from hashlib import md5

# the input data
input_string = 'Hello everyone!'

# hashlib requires the input to be in form of bytes
# encode converts string into bytes format
hash_value = md5(input_string.encode())

# message digest in bytes
print("Hash value as bytes:", hash_value.digest())

# message digest as hexadecimal digits
print("Hash value as hexadecimal:", hash_value.hexdigest())

Output:

Output for Python MD5 Hash Function

The hashlib library requires input as bytes so we use the encode method to convert strings to bytes. Calling the md5 function creates an md5 hash object. This can be further modified by adding more data to it. The update method can be used to append more data.

Suppose we had string A as input before and later we called append with string B, the output will be the same as calling md5 on A + B.

Python MD5 Hexadecimal Method

The hexadecimal method converts bytes into hexadecimal digits. We get an output of length 32 in hexadecimal.

Code:

# importing the library
from hashlib import md5

# input string 1
input_string_1 = "Hello everyone!"

# call the hash function
hash_value = md5(input_string_1.encode())

# input_string 2
input_string_2 = "Nice to meet you."
hash_value.update(input_string_2.encode())

# hash value in hexadecimal
print("Hash value using update:", hash_value.hexdigest())

# hash of the whole string
whole_string = input_string_1 + input_string_2
hash_whole_value = md5(whole_string.encode())
print("Hash value using the full string:", hash_whole_value.hexdigest())

Output:

Output for Python MD5 Hexadecimal Method

We get the same value after using the update function. It is useful when we don’t get the whole data in one go. Notice that the length of input here is greater than the previous code, but the output length remains the same.

Conclusion

We have seen how we can use the md5 hash function in python. Although it is still commonly used, it is not the most efficient or secure hash function.

  • It is prone to brute force attacks. The advances in processor speeds over the years have made it possible to crack md5 by brute force attacks. It would still take a few days to crack it, but it’s comparatively less time than other more secure hashes.
  • It has low collision resistance. MD5 can give the same hash output for different inputs. It makes it easier to crack the code.
  • MD5 is slower than modern hash functions

Thus, it is advisable to use the SHA hash function for cryptographic use cases and other faster functions when we have to hash huge data files.

If you have any questions/doubts in mind, please use the comments below.

Thank you for reading this article, click here to start learning Python in 2022.


Also Read:

Share:

Author: Ayush Purawr