Normalization is a common preprocessing step in data science and machine learning that involves scaling the values of a numerical dataset to a common scale. Normalizing a NumPy array can be useful in several ways. For example, it can improve the performance of algorithms that are sensitive to the scale of the data, such as gradient descent optimization or distance-based clustering. In this thread, we will explore some techniques that help us in normalizing the array using NumPy.
1. By "linalg.norm()" function:
The linalg.norm()
function in NumPy is used to calculate the norm of a given vector or matrix. The norm of a vector is a measure of its length, while the norm of a matrix is a measure of its size.
The linalg.norm()
function can be used to calculate various types of norms, such as the Euclidean norm, the Frobenius norm, and the infinity norm.
You can normalize a 2D array using the linalg.norm()
function. Here’s an example of normalizing a 2D array:
The above code creates a 5x5 NumPy array with random values. It uses the np.linalg.norm()
function with the axis=1 argument to calculate the norm along each row, and then dividing the original matrix by the norm of each row using np.newaxis
to ensure that the division is performed element-wise along the rows. The resulting array is an array where each row has been normalized to have a unit length.
-
You should use the above method when you want to preserve the relative magnitudes of values within each row.
-
It can help to reduce the impact of outliers within each row. It may not be suitable for data where the distribution of values across the matrix is important and where the relative magnitudes of values between rows are important.
2. By "mean()" and "std()" functions:
The mean()
function calculates the average value of the elements in an array. The std()
function, on the other hand, calculates the standard deviation of the elements in an array, which is a measure of the spread or dispersion of the data.
The mean()
and std()
functions can be used to calculate the mean and standard deviation of a given array or a specific axis of a multidimensional array. For example:
The above code creates a 5x5 random matrix using NumPy’s random.rand()
function.
It then normalizes the matrix by subtracting the mean of the matrix and dividing by the standard deviation of the matrix using the formula (x - mean(x)) / std(x)
.
-
You should use the above method when you want to normalize the entire matrix based on its distribution.
-
It can be useful for detecting outliers or identifying patterns in the distribution of values across the matrix.
-
It does not preserve the relative magnitudes of values within each row.