Numpy For Machine Learning: A Complete Guide

Numpy For Machine Learning: A Complete Guide

Up until now you’ve learned about the general idea of what ML does, set up your environment, and got to know about the working of your coding environment i.e. Jupyter Notebook. In this section, you’ll learn about a very powerful library called Numpy. We’ll learn about Numpy Array(np array for short) and operations on them, along with what makes them better than the pre-existing data structures.

In this section, you’ll be getting your hands dirty by coding, so it would be good if you have your environment set up already. If not, you can follow our tutorial for the same.

Numpy Arrays

The most important entity in the whole NumPy package is the Numpy Array. If you’ve worked with Python before you must be familiar with the data structure called Lists. List are containers that can store any kind of data in it.

You can think of the np array as a homogenous list. What that means is that np arrays store data with the same data type i.e. you can’t store both integer and string in the same array. Still, a question might arise as to why to use the np array over the list.

Why use Numpy?

The simple answer to that question is performance. Np arrays are faster, more compact, and consume lesser space than the list. Not only that, they are much more convenient to use than lists. How? We’ll see in a bit.

Creating np. array

In order to create an np array, we can use the array() function in the NumPy module. Let’s start by creating an array that stores odd numbers in the range 0-10:-

import numpy as np
arr = np.array([2,4,3])
print(arr)
Output:-
[2 4 3]

Printing shape of the array

In the code above we start by importing our NumPy module and after that, we created an np array by passing an array-like object, in this case, a list. It’ll create an np array with elements 2,4,3 with the integer data type. We can even create a multidimensional np array by passing a 2-d list.

arr_2d = np.array([[2,4,6,8],
                   [3,5,7,9]])
print(arr_2d.shape)
Output:-
(2, 4)

Printing Data Type of Array Elements:-

Let’s understand the above output, we created an array with 2 rows and 4 columns. Next, we printed arr_2d.shape what this return is the shape of the array in form of a tuple, in this case (2,4) where 2 is the no. of rows and 4 is the no. of columns.

Until now we’ve created an array by passing a list that stores data of the same type i.e. integer. But what’ll happen if we pass a list that stores data of different types?

arr = np.array([2,'a',6])
print(arr, arr.dtype)
Output:-
['2' 'a' '6'] <U21

In the above code, we started by creating an array by passing a list with both integer and string data. But arrays are homogenous and that’s why NumPy automatically converts all the integers to strings too. We then printed our array and arr. dtype. dtype returns the data type of elements, in this case, <U21.

<U21 has 2 parts in it. U denotes Unicode dtype and 21 denotes no. of elements it can hold.

Specifying the dtype of the array

Let’s take one more example:-

str_arr = np.array([2,'1',6])
int_arr = np.array([2,'1',6],dtype = 'int')

print(str_arr, str_arr.dtype)
print(int_arr, int_arr.dtype)
Output:-
['2' '1' '6'] <U21 
[2 1 6] int64

In the above example, we created 2 arrays bypassing the same list but in int_arr we pass an extra argument dtype using which we can manually tell the dtype of elements in the array. If you don’t specify dtype it’ll convert all elements to the string.

Creating an array with the specific format

  1. np.zeros(shape): Creates an array of a given shape (tuple passed as a parameter) filled with 0.
  2. np.ones(shape): Creates an array of a given shape(tuple passed as a parameter) filled with 1.
  3. np.empty(shape): Creates an array of a given shape(tuple passed as a parameter) whose content is random and depends on the state of memory.
  4. np.arange(start = 0,end): Same as range() function but it returns a ndarray.
  5. np.linspace(start,stop,num): Returns num equally spaced elements, in the interval [startstop].
  6. np.full(shape,fill_value): Creates an array of given shape(tuple passed as parameter) filled with fill_value.
print(np.zeros(3))
print(np.ones(5))
print(np.empty(2))
print(np.arange(2,10))
print(np.linspace(2,10,num = 3))
print(np.full(5,fill_value = 10))
Output:-
[0. 0. 0.] 
[1. 1. 1. 1. 1.] 
[6.94305989e-310 6.94304488e-310] 
[2 3 4 5 6 7 8 9] 
[ 2.  6. 10.] 
[10 10 10 10 10]

Array Operations

Element-wise Operations

The thing that makes an array more convenient than a list is the elementwise operation. If you want to perform a set of operations over 2 arrays or over all the elements of a single array, you can do so in a very straightforward way. Let’s see how:-

a = np.array([1,2,3,4,5,6])
print(a + 3)
Output:-
[4 5 6 7 8 9]

As you can see when we added 3 to the array it performed addition over each and every element of the array. Now let’s see how it performs as compared to lists.

Numpy for machine learning

From the above results, it’s clear that the NumPy array performed faster than a list. And that’s the thing that makes array popular it’s not just convenient but also efficient.

We can do the above for all the operations:-

a = np.array([1,2,3,4,5])
print(a + 3)
print(a - 3)
print(a * 3)
print(a / 3)
print(a // 3)
print(a ** 3)
print(a % 3)
Output:-
[4 5 6 7 8] 
[-2 -1  0  1  2] 
[ 3  6  9 12 15] 
[0.33333333 0.66666667 1.         1.33333333 1.66666667] 
[0 0 1 1 1] 
[  1   8  27  64 125] 
[1 2 0 1 2]

Operations among Arrays

We now know that when applying an operation over a scalar and an array it’ll apply that operation over each element. And the same thing goes when you apply an operation over 2 arrays. It’ll apply the operation among the corresponding elements.

a = np.array([1,2,3,4,5])
b = np.array([5,4,3,2,1])
print(a + b)
print(a - b)
print(a * b)
print(a / b)
print(a // b)
print(a ** b)
print(a % b)
Output:-
[6 6 6 6 6] 
[-4 -2  0  2  4] 
[5 8 9 8 5] 
[0.2 0.5 1.  2.  5. ] 
[0 0 1 2 5] 
[ 1 16 27 16  5] 
[1 2 0 0 0]

There is one other operation that I wanna talk about and that is @. @ is an operation to perform matrix multiplication between 2 arrays.

a = np.array([[1, 2, 3], 
              [4, 5, 6]])
b = np.array([[1, 3], 
              [4, 5],
              [2, 6]])
print(a@b)
Output:-
[[15 31]  
 [36 73]]

If you are not familiar with Matrix Multiplication then you can read about it here.

Basic Reductions:-

Along with arithmetic operations, you can do other interesting things with the array that might not be possible easily with lists. And these operations really are quite handy too for data manipulation. Let’s see what they are and what they do.

a = np.array([[1, 2, 3], 
              [4, 5, 6]])
print(a.ravel())
print(a)
Output:-
[1 2 3 4 5 6] 
[[1 2 3]  
 [4 5 6]]

Hmm, something weird just happened in the above code we applied this ravel() function which gave flatten an as output but when we printed the array again it was unchanged, we’ll just about it soon but let’s first understand what ravel() does. ravel() will take a higher dimensional array as input and returns a copy of the flattened array. Since it returns a copy of the flattened array it won’t change the array itself.

Let’s see a few other functions:-

a = np.array([[1, 2, 3], 
              [4, 5, 6]])
b = np.array([1, 4, 5, 6, 2, 3])

print(b.reshape((3,2)))
print(a.T)
b.sort() 
print(b)
Output:-
[[1 4]  
 [5 6]  
 [2 3]] 

[[1 4]  
 [2 5]  
 [3 6]] 
[1 2 3 4 5 6]

reshape(shape) takes in a shape tuple and returns the reshaped array, it doesn’t reshape the original array like the ravel() function.

T returns the transpose of the array, it too doesn’t reshape the original array-like reshape() function.

sort() unlike other functions seen until now, changes the array itself i.e. it sorts the array.

Other Operations:-

There are many other functions in NumPy that might seem obvious but actually are quite useful especially when you implement ML algorithms from scratch. Let’s see a few of them.

a = np.array([1, 4, 5, 6, 2, 3])

print(a.max())  #Returns Max Element of the array
print(a.min())  #Returns Min Element of the array
print(a.mean()) #Returns Mean of the array
print(a.std())  #Returns Standard Deviation of the array
Output:-
6 
1 
3.5 
1.707825127659933

Along with the above operations, we can also use the following functions to return indices of the element instead of the element itself.

a = np.array([1, 4, 5, 6, 2, 3])

print(a.argmax())       # Returns Index of Max Element in The array
print(a.argmin())       # Returns Index of Min Element in The array
print(a.argsort())      # Returns Previous Index of Sorted Array Elements
print(np.argwhere(a>4)) # Returns Indices of elements that fulfill the condition
Output:-
3 
0 
[0 4 5 1 2 3] 
[[2]  
 [3]]

There are 2 other functions and a parameter that I wanna talk about. But before I explain them I want you to look at the following code and make a guess about what they do.

a = np.array([[1,2,3,4,5,6,7,8,9]])

print(a.shape)
print(a.squeeze())
print(a.sum())
Output:-
(1, 9) 
[1 2 3 4 5 6 7 8 9] 
45

I guess you were able to understand what a sum does i.e. it returns the sum of the array elements. But what about squeeze()? Does it flatten the array? Well in this case it did but squeeze() is used to remove all the single-dimensional entries from the array. Let’s understand with another example.

a = np.array([[[1],[2],[3]]])

print(a.shape)
print(a.squeeze())
print(a.squeeze().shape)
(1, 3, 1) 
[1 2 3] 
(3,)

So in the above code, we took an array of shape (1,3,1) and squeezed it, which remove the single dimension entries and converted it to a shape of (3,).

Last but not least is the axis argument.

a = np.array([[1,2,3,4],
              [5,6,7,8]])

print(a.sum(axis = 0))
print(a.sum(axis = 1))
Output:-
[ 6  8 10 12] 
[10 26]

sum() is used to calculate the sum of the array. But when we passed the axis parameter to it we got 2 different results. axis is a parameter that most function in NumPy have, axis determines along which axis you want to apply a function. axis = 0 means apply function across column and axis = 1 means apply function across row.

For example, in the above code for axis = 1, you got a list that contained 2 elements denoting the sum of row 1 and row 2, whereas for axis = 0 you got a list that contained 4 elements denoting the sum of column 1 to 4.

Indexing and Slicing

Slicing in the array is a bit different than a list. If you aren’t familiar with slicing and indexing in the list then you can read about it here.

Let’s imagine a 2-d NumPy array X with shape (r,c). Now slicing in NumPy is done like X[row_start:row_end, col_start:col_end], where elements on row_end and col_end are excluded. Let’s understand it with the code.

a = [[1,2,3],
     [4,5,6],
     [7,8,9]]
b = np.array(a)

print(b[1:,:-1])
Output:-
[[4 5]  
 [7 8]]

Voila. So what exactly happened here? We sliced array b from row 1 to the end and we sliced the columns from the start to the second last column. That’s how you slice an array.

Now the basics of indexing in an array are the same as that in list indexing starting from 0 and so on and from the back it -1 and so on. In list, you can only fetch one element at a time using the index, but in NumPy, you can fetch elements at the different indexes at the same time:-

a = np.array([1,2,3,4,5,6,7,8,9])

print(a[0])
print(a[[0,3,5,2]])

a[1] = 5
print(a)

a[[1,2]] = 3
print(a)
Output:-
1 
[1 4 6 3] 
[1 5 3 4 5 6 7 8 9] 
[1 3 3 4 5 6 7 8 9]

As you can see the single index operation is the same as the list. But in the array, you can pass a list of the index to fetch or change the value of elements at those indices

Boolean Indexing

Numpy arrays have a powerful ability to use boolean conditions as input. For example what if you need to fetch all elements with values more than 4? We can simply pass the condition as an index and fetch the values. Yes, it’s that simple.

a = np.array([1,2,3,4,5,6,7,8,9])

print(a[a>4])
Output:-
[5 6 7 8 9]

Let’s understand one more function called where. In where() you can pass a boolean expression as input and returns a tuple of the array with indices of the element that satisfy that condition and yes they can be passed as an index too.

a = np.array([1,2,3,4,5,6,7,8,9])

print(np.where(a>3))
print(a[np.where((a>3))])
Output:-
(array([3, 4, 5, 6, 7, 8]),) 
[4 5 6 7 8 9]

Broadcasting

Broadcasting can be defined as the ability of Numpy to treat arrays of different shapes during arithmetic operations. We’ve learned how arithmetic operations on arrays are done on corresponding elements. But what if arrays of 2 different shapes are added? Let’s see with an example.

a = np.array([[1,2,3],
              [4,5,6]])
b = np.array([1,2,3])

print(a+b)
Output:-
[[2 4 6]  
 [5 7 9]]

So as we can see array a has shape (2,3) and array b has shape (3,). One would assume that it’ll give an error, or some might think that addiction will on happen on row one, but it actually executed successfully and added array b to both the rows. But in order to do that, it broadcasted array b to [[1,2,3],[1,2,3]] and then added it to array a.

Whenever operation over 2 arrays is applied the smaller array is broadcasted to the size of the larger array.

File I/O using Numpy

Numpy, along with data manipulation functions also provides functions to load and save the data. genfromtxt() and savetxt() are 2 functions that can be used to load and save data respectively. Let’s see how.

np.genfromtxt('demo.txt',delimiter = ',', dtype = 'int')
Output:-
array([[1, 2, 3],        
      [4, 5, 6],        
      [7, 8, 9]])

Now in the above code, we used genfromtxt() to load a file name demo.txt into a NumPy array. There a few parameters we defined above those are:-

  1. delimiter: How are elements in the row separated? In demo.txt elements were separated by a comma, hence I passed ‘,’ as a delimiter.
  2. dtype: dtype of loading array. The default is float.

We can also load .csv files using genfromtxt() the same way. We’ll learn more about .csv files in the pandas section.

a = np.array([[1.1651, 2.131513, 1.13153]])
np.savetxt('array.txt', a, delimiter=',', fmt = '%.2f')

Above we saved a NumPy array in a file named array.txt and passed delimiter as ‘,’ telling it to separate array elements using a comma. There is another argument fmt which is used to define how to store an element here I told it to store elements up to 2 decimal points only.

Hope you like our article based on “Numpy For Machine Learning: A Complete Guide”, but still if you have some questions in your mind or if you found something wrong in the article, you can comment here.


Also Read:

  • Top 15 Python Libraries For Data Science in 2022
    Introduction In this informative article, we look at the most important Python Libraries For Data Science and explain how their distinct features may help you develop your data science knowledge. Python has a rich data science library environment. It’s almost impossible to cover everything in a single article. As a consequence, we’ve compiled a list…
  • Top 15 Python Libraries For Machine Learning in 2022
    Introduction  In today’s digital environment, artificial intelligence (AI) and machine learning (ML) are getting more and more popular. Because of their growing popularity, machine learning technologies and algorithms should be mastered by IT workers. Specifically, Python machine learning libraries are what we are investigating today. We give individuals a head start on the new year…
  • Setup and Run Machine Learning in Visual Studio Code
    In this article, we are going to discuss how we can really run our machine learning in Visual Studio Code. Generally, most of the machine learning projects are developed as ‘.ipynb’ in Jupyter notebook or Google Collaboratory. However, Visual Studio Code is powerful among programming code editors, and also possesses the facility to run ML…
  • Diabetes prediction using Machine Learning
    In this article, we are going to build a project on Diabetes Prediction using Machine Learning. Machine Learning is very useful in the medical field to detect many diseases in the early stage. Diabetes prediction is one such Machine Learning model which helps to detect diabetes in humans. Also, we will see how to Deploy…
  • 15 Deep Learning Projects for Final year
    Introduction In this tutorial, we are going to learn about Deep Learning Projects for Final year students. It contains all the beginner, intermediate and advanced level project ideas as well as an understanding of what is deep learning and the applications of deep learning. What is Deep Learning? Deep learning is basically the subset of…
  • Machine Learning Scenario-Based Questions
    Here, we will be talking about some popular Data Science and Machine Learning Scenario-Based Questions that must be covered while preparing for the interview. We have tried to select the best scenario-based machine learning interview questions which should help our readers in the best ways. Let’s start, Question 1: Assume that you have to achieve…
  • Customer Behaviour Analysis – Machine Learning and Python
    Introduction A company runs successfully due to its customers. Understanding the need of customers and fulfilling them through the products is the aim of the company. Most successful businesses achieved the heights by knowing the need of customers and dynamically changing their strategies and development process. Customer Behaviour Analysis is as important as a customer…
  • NxNxN Matrix in Python 3
    A 3d matrix(NxNxN) can be created in Python using lists or NumPy. Numpy provides us with an easier and more efficient way of creating and handling 3d matrices. We will look at the different operations we can provide on a 3d matrix i.e. NxNxN Matrix in Python 3 using NumPy. Create an NxNxN Matrix in…
  • 3 V’s of Big data
    In this article, we will explore the 3 V’s of Big data. Big data is one of the most trending topics in the last two decades. It is due to the massive amount of data that has been produced as well as consumed by everyone across the globe. Major evolution in the internet during the…
  • Naive Bayes in Machine Learning
    In the Machine Learning series, following a bunch of articles, in this article, we are going to learn about the Naive Bayes Algorithm in detail. This algorithm is simple as well as efficient in most cases. Before starting with the algorithm get a quick overview of other machine learning algorithms. What is Naive Bayes? Naive Bayes…

Share:
Avatar of Keerthana Buvaneshwaran

Author: Keerthana Buvaneshwaran

2 thoughts on “Numpy For Machine Learning: A Complete Guide

Comments are closed.