python – Moving forward in a panda dataframe looking for the first occurrence of multi-conditions with reset

I am having trouble with multi-conditions moving forward in a dataframe.

Here’s a simplification of my model:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'date':pd.date_range(start="2022-05-12", periods=27),
    'l': [10.0,9.9,11.1,10.9,12.1,9.6,13.1,17.9,18.0,15.6,13.5,14.2,10.5,9.5,7.6,9.8,10.2,15.3,17.7,21.8,10.9,18.9,16.4,13.3,7.1,6.8,9.4],
    'c': [10.5,10.2,12.0,11.7,13.5,10.9,13.9,18.2,18.8,16.2,15.1,14.8,11.8,10.1,8.9,10.5,11.1,16.9,19.8,22.0,15.5,20.1,17.7,14.8,8.9,7.3,10.1],
    'h': [10.8,11.5,13.4,13.6,14.2,11.4,15.8,18.5,19.2,16.9,16.0,15.3,12.9,10.5,9.2,11.1,12.3,18.5,20.1,23.5,21.1,20.5,18.2,15.4,9.6,8.4,10.5],
    'oc': [False,True,False,False,False,True,False,False,False,False,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False],
    's': [np.nan,10.5,np.nan,np.nan,np.nan,14.5,np.nan,np.nan,np.nan,np.nan,8.1,np.nan,10.7,np.nan,np.nan,np.nan,np.nan,6.9,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
    'i': [np.nan,9.5,np.nan,np.nan,np.nan,13.6,np.nan,np.nan,np.nan,np.nan,7.0,np.nan,9.9,np.nan,np.nan,np.nan,np.nan,9.2,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
    't': [np.nan,15.5,np.nan,np.nan,np.nan,16.1,np.nan,np.nan,np.nan,np.nan,16.5,np.nan,17.2,np.nan,np.nan,np.nan,np.nan,25.0,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]
})

df = df.set_index('date')
# df Index is datetime type

print(df)

               l     c     h     oc     s     i     t
date                                                 
2022-05-12  10.0  10.5  10.8  False   NaN   NaN   NaN
2022-05-13   9.9  10.2  11.5   True  10.5   9.5  15.5
2022-05-14  11.1  12.0  13.4  False   NaN   NaN   NaN
2022-05-15  10.9  11.7  13.6  False   NaN   NaN   NaN
2022-05-16  12.1  13.5  14.2  False   NaN   NaN   NaN
2022-05-17   9.6  10.9  11.4   True  14.5  13.6  16.1
2022-05-18  13.1  13.9  15.8  False   NaN   NaN   NaN
2022-05-19  17.9  18.2  18.5  False   NaN   NaN   NaN
2022-05-20  18.0  18.8  19.2  False   NaN   NaN   NaN
2022-05-21  15.6  16.2  16.9  False   NaN   NaN   NaN
2022-05-22  13.5  15.1  16.0   True   8.1   7.0  16.5
2022-05-23  14.2  14.8  15.3  False   NaN   NaN   NaN
2022-05-24  10.5  11.8  12.9   True  10.7   9.9  17.2
2022-05-25   9.5  10.1  10.5  False   NaN   NaN   NaN
2022-05-26   7.6   8.9   9.2  False   NaN   NaN   NaN
2022-05-27   9.8  10.5  11.1  False   NaN   NaN   NaN
2022-05-28  10.2  11.1  12.3  False   NaN   NaN   NaN
2022-05-29  15.3  16.9  18.5   True   6.9   9.2  25.0
2022-05-30  17.7  19.8  20.1  False   NaN   NaN   NaN
2022-05-31  21.8  22.0  23.5  False   NaN   NaN   NaN
2022-06-01  10.9  15.5  21.1  False   NaN   NaN   NaN
2022-06-02  18.9  20.1  20.5  False   NaN   NaN   NaN
2022-06-03  16.4  17.7  18.2  False   NaN   NaN   NaN
2022-06-04  13.3  14.8  15.4  False   NaN   NaN   NaN
2022-06-05   7.1   8.9   9.6  False   NaN   NaN   NaN
2022-06-06   6.8   7.3   8.4  False   NaN   NaN   NaN
2022-06-07   9.4  10.1  10.5  False   NaN   NaN   NaN

This is the result I am trying to achieve:

               l     c     h     oc     s     i     t    cc  diff    r
date                                                                  
2022-05-12  10.0  10.5  10.8  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-05-13   9.9  10.2  11.5   True  10.5   9.5  15.5   NaN   NaN  NaN
2022-05-14  11.1  12.0  13.4  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-05-15  10.9  11.7  13.6  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-05-16  12.1  13.5  14.2  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-05-17   9.6  10.9  11.4   True  14.5  13.6  16.1   NaN   NaN  NaN
2022-05-18  13.1  13.9  15.8  False   NaN   NaN   NaN  True   5.3    t
2022-05-19  17.9  18.2  18.5  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-05-20  18.0  18.8  19.2  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-05-21  15.6  16.2  16.9  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-05-22  13.5  15.1  16.0   True   8.1   7.0  16.5   NaN   NaN  NaN
2022-05-23  14.2  14.8  15.3  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-05-24  10.5  11.8  12.9   True  10.7   9.9  17.2   NaN   NaN  NaN
2022-05-25   9.5  10.1  10.5  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-05-26   7.6   8.9   9.2  False   NaN   NaN   NaN  True  -7.0    s
2022-05-27   9.8  10.5  11.1  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-05-28  10.2  11.1  12.3  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-05-29  15.3  16.9  18.5   True   6.9   9.2  25.0   NaN   NaN  NaN
2022-05-30  17.7  19.8  20.1  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-05-31  21.8  22.0  23.5  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-06-01  10.9  15.5  21.1  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-06-02  18.9  20.1  20.5  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-06-03  16.4  17.7  18.2  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-06-04  13.3  14.8  15.4  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-06-05   7.1   8.9   9.6  False   NaN   NaN   NaN  True  -7.7    i
2022-06-06   6.8   7.3   8.4  False   NaN   NaN   NaN   NaN   NaN  NaN
2022-06-07   9.4  10.1  10.5  False   NaN   NaN   NaN   NaN   NaN  NaN

Principles:

  • We always move forward in the dataframe
  • When oc is True we ‘memorize’ both c, s, i and t values ​​from this row
  • Moving forward we look for the first occurrence of the following conditions:
  1. h >= t
  2. l <= s
  3. l <= i

When it happens we set cc to True and we calculate the difference of the ‘memorized’ values ​​when oc was True and write a letter to distinguish the condition:

  • If h >= t: diff = tc and r=”t”
  • If l <= s: diff = sc and r="s"
  • If l <= i: diff = ic and r="i"

Once one of the conditions has been met, we look again for oc is True and then the conditions to be met, until the end of the dataframe.

If oc is True again before one of the conditions has been met since the first time it was True, we omit it.

What happens chronologically:

  1. 2022-05-13: oc is True so we memorize c, s, i, t
  2. 2022-05-17: oc is True but none of the conditions have been met, yet -> omission
  3. 2022-05-18: h > t[2022-05-13] -> diff = t[2022-05-13]-c[2022-05-13] = 15.5-10.2 = 5.3, r=”t”
  4. 2022-05-22: oc is True so we memorize c, s, i, t
  5. 2022-05-24: oc is True but none of the conditions have been met, yet -> omission
  6. 2022-05-26: l < s[2022-05-22] -> diff = s[2022-05-22]-c[2022-05-22] = 8.1-15.1 = -7.0, r=”s”
  7. 2022-05-29: oc is True so we memorize c, s, i, t
  8. 2022-06-05: l < i[2022-05-29] -> diff = i[2022-05-29]-c[2022-05-29] = 9.2-16.9 = -7.7, r=”i”

A loop works but take an enormous amount of time, if possible I’d like to avoid it.

I’ve tried a really good solution from Baron Legendre described here which works perfectly when looking for equal values ​​but I can’t seem to adapt it to my model. Also I’m having an index problem: I’m getting different results when using a datetime Index even when I reset it.

I’ve been stuck with that problem for a while now so any help would gladly be appreciated.

Leave a Comment