Normalization is a common preprocessing step in data science and machine learning that involves scaling the values of a numerical dataset to a common scale. Normalizing a NumPy array can be useful in several ways. For example, it can improve the performance of algorithms that are sensitive to the scale of the data, such as gradient descent optimization or distance-based clustering. In this thread, we will explore some techniques that help us in normalizing the array using NumPy.
1. By "linalg.norm()" function:
linalg.norm() function in NumPy is used to calculate the norm of a given vector or matrix. The norm of a vector is a measure of its length, while the norm of a matrix is a measure of its size.
linalg.norm() function can be used to calculate various types of norms, such as the Euclidean norm, the Frobenius norm, and the infinity norm.
You can normalize a 2D array using the
linalg.norm() function. Here’s an example of normalizing a 2D array:
The above code creates a 5x5 NumPy array with random values. It uses the
np.linalg.norm() function with the axis=1 argument to calculate the norm along each row, and then dividing the original matrix by the norm of each row using
np.newaxis to ensure that the division is performed element-wise along the rows. The resulting array is an array where each row has been normalized to have a unit length.
You should use the above method when you want to preserve the relative magnitudes of values within each row.
It can help to reduce the impact of outliers within each row. It may not be suitable for data where the distribution of values across the matrix is important and where the relative magnitudes of values between rows are important.
2. By "mean()" and "std()" functions:
mean() function calculates the average value of the elements in an array. The
std() function, on the other hand, calculates the standard deviation of the elements in an array, which is a measure of the spread or dispersion of the data.
std() functions can be used to calculate the mean and standard deviation of a given array or a specific axis of a multidimensional array. For example:
The above code creates a 5x5 random matrix using NumPy’s
It then normalizes the matrix by subtracting the mean of the matrix and dividing by the standard deviation of the matrix using the formula
(x - mean(x)) / std(x).
You should use the above method when you want to normalize the entire matrix based on its distribution.
It can be useful for detecting outliers or identifying patterns in the distribution of values across the matrix.
It does not preserve the relative magnitudes of values within each row.