How to create a row-strided Pandas Dataframe from a given Series using Python?

mubashir_rizvi · March 22, 2023, 4:37pm

I am trying to create a row-strided dataframe using a series, I do have an idea of what I want but I don’t know how to go about it in code. For example, if I have a series as follows:

my_series = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Then I want a output which is similar to:

   col1  col2  col3
0     1     2     3
1     2     3     4
2     3     4     5
3     4     5     6
4     5     6     7
5     6     7     8
6     7     8     9
7     8     9    10

In this output particularly, there is overlapping of two elements within each row and I want to learn how you can create these overlapping row-strided dataframes as well non-overlapping ones such as:

   col1  col2  col3
0     1     2     3
1     4     5     6
2     7     8     9
3     10    -     -

If anyone could provide some codes and methods to create both types (overlapping and non-overlapping) dataframes using a series, please provide them below with an example code which will help me greatly.

sabih · April 20, 2023, 1:35pm

Hi @mubashir_rizvi This code snippet may help you:

Non-overlapping strides, which mean that within rows, there will be no overlapping or common values, and values in each row will be unique.
In the loop, we iterate over the length of the series, with a step size equal to the stride length. The stride is a list slice of the original series, starting at the current index and ending at the current index plus the stride length. This creates a list of non-overlapping strides.
Lastly, a DataFrame is created using pd.DataFrame() which has one row for each stride, with the values of each stride as columns.

Note:

You can rename the resulting columns if you want using a similar code line that was used in methods 1 and 2.
This is a simple method by which you can create non-overlapping strides easily.
Also, if the number of values in your series can’t be divided equally considering the stride_len, then the final result will have NaN values. You can see this in the last row of the above code snippet.

safa · April 20, 2023, 3:52pm

Hey @mubashir_rizvi , You can also get this by using the NumPy library’s function to create overlapping strides.

Note:

This method is the most flexible if you want to create overlapping strides.
However, a disadvantage of this method is that if total elements in strided data can’t be equally adjusted in rows and columns, then you would have to drop some rows because the NumPy function would create the same number of rows as (len(series) - stride_len + 1).
Also, remember that the value of overlap should be less than the stride_len.

nimrah · April 24, 2023, 6:13pm

Hey @mubashir_rizvi, you can use this method to create non-overlapping strides. In this method, the shape and strides of the data are calculated differently and are passed as arguments in the np.lib.stride_tricks.as_strided() function to create a view of the original data. The code uses the resulting view to create a new DataFrame using pd.DataFrame() which has one row for each stride, with the values of each stride as columns. For example: