# How to split a dataset into equal-sized subsets based on an indexing variable and calculate the mean using NumPy?

Hey, I had work assigned to me in school which to basically write a code that was splitting a dataset into equal-sized subsets based on some indexing variable and then computing the mean of each subset. I had a hard time interpreting the logic for this, can anyone please provide me with a solution for this?

1 Like

Hello @safiaa.02, I have provided a solution below which splits a random dataset of 100 entries into groups based on unique random indexes and then calculates the mean for each group.

• The code generates an array of `100` random numbers between `0` and `1`, which serves as our sample dataset. It also generates an array of `100` random integers between `0` and `9` to be used as an indexing variable.
• Next, the unique values in `S` and their corresponding indices are found using `np.unique()`. An array of zeros with the same length as the number of unique values in `S` is created to store the mean of `D` values for each unique value.
• The code then loops over each unique value in `S` and calculates the mean of the `D` values that correspond to the current unique value in `S`. This is done using boolean indexing to select the `D` values with the same index as the current unique value.