I have a huge dataset with lat/lon coordinates and some features (it’s about public transport). My purpose is to find a number of traffic lights on a route. The problem is, that the datastructure doesn’t allow to get separate routes and I have to operate with the whole dataset which covers several months and has secondwise granularity. That means that for one traffic light I have a lot of gps-coordinates which are only a few meters one from another. And the task is to reduce this amount of gps-coordinates so I get only ONE coordinte-pair for ONE traffic light. I made some steps and now I have dataframe which looks this way:

```
data = {'date': ['2021-10-18','2021-10-18','2021-10-18','2021-10-18','2021-10-18','2021-10-18','2021-10-18','2021-10-18','2021-10-18','2021-10-18','2021-10-18','2021-10-18','2021-10-18','2021-10-18','2021-10-18',
'2021-10-18','2021-10-18','2021-10-18','2021-10-18','2021-10-18','2021-10-04','2021-10-04','2021-10-04','2021-10-04','2021-10-04','2021-10-04','2021-10-04','2021-10-04','2021-10-04',
'2021-10-04','2021-10-04','2021-10-04','2021-10-04','2021-10-04','2021-10-04','2021-10-04','2021-10-04','2021-10-04','2021-10-04','2021-10-04'],
'time': ['04:55:55','04:57:20','04:57:47','04:57:56','04:58:06','04:59:59','05:00:57','05:02:06','05:02:49','05:03:15','05:04:01','05:04:26','05:04:49','05:05:06','05:06:42','05:07:39',
'05:08:52', '05:09:39','05:09:45','05:10:48','04:33:02','04:33:30','04:33:40','04:33:50', '04:35:44', '04:36:50', '04:38:05', '04:38:55', '04:39:27', '04:40:14', '04:40:39','04:40:48',
'04:41:03', '04:42:03', '04:42:58', '04:43:55', '04:44:48', '04:44:55', '04:45:23', '04:45:42'],
'intersection': ['intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light',
'intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light',
'intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light',
'intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light',
'intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light',
'intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light',
'intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light',
'intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light','intersection with traffic light'],
'longitude': [12.39476,12.39273139,12.39165194,12.39082944,12.39048889,12.38607694,12.384235,12.38217306,12.38082611,12.377815,12.37579417,12.37344,12.37324472,12.37328833,12.37339389,
12.37346444,12.37354778,12.37365694,12.37367139,12.37369528,12.39270917,12.39166111,12.39082972,12.39049861,12.38609611,12.38424417,12.38213278,12.38081778,12.37780417,12.37576306,
12.37338806,12.37325139,12.37328833,12.3733425,12.37341,12.37349139,12.37359806,12.37361,12.3736375,12.37372528],
'latitude': [51.28439083,51.28715778,51.28938389,51.29012778,51.29072139,51.29580667,51.30110722,51.30425417,51.30640389,51.30806056,51.3088925,51.3099825,51.31048111,51.31150611,
51.3147425,51.31745667,51.32047083,51.32413,51.32455833,51.32514389,51.28719056,51.28937667,51.2901275,51.29070417,51.29577389,51.30108056,51.30431,51.30641833,51.30806444,51.30890833,
51.31002389,51.31054167,51.31151028,51.31470833,51.31748917,51.32046694,51.32410833,51.32452333,51.32514528,51.3266075],
'measure': [0.58,46.11,59.51,4.93,22.51,-58.49,97.24,6.27,6.12,5.98,5.97,5.98,5.91,6.12,5.98,64.76,252.86,134.51,198.99,4.71,0.0,0.0,0.0,0.0, 1.6,1.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.6,1.6,1.6,
1.6,0.0,0.0,0.0],
'distance': [0.0, 338.80883546503435,258.8586443913898,100.70483527232803,70.18422302767863,644.0439236587058,603.5386497851093,378.4917854757065,256.9494020292987,279.3917492835428,
168.5934711941755,204.0901046994475,57.11922120985705,114.0761160032451,360.13741023688664,302.003203591515,335.38882347626014,407.1700274516668,47.664275824026944,65.16745993800518,342.7123292169329,
253.96517553235194,101.69272687297509,68.18793453494897,642.2083802222841,604.3526349828485,388.29131154667175,251.84909010340672,278.74949164225853,170.5056375968028,206.95284289377074,
58.38815419745605,107.79273387770405,355.81675100197504,309.41626170998205,331.33853649772004,405.18904714111363,46.17811034656468,69.22127740066063,162.7935968085381]}
df = pd.DataFrame(data=data)
df
```

What I did: first, took only first rows, where a change in column `intersection`

occured. Then took only `'intersection with traffic light'`

and calculated distances between consecutive points with `geopy.distance`

. So, based on distance I can filter out gps-coordinates which are within eg 50 meters from each other. But it doesn’t solve my problem and I think it is a wrong approach. I guess, I should create a matrix with distances of all points to each other, eg with `cdist`

and based on that matrix filter all redundant gps-coordinates. Or it should be another way? Please advice!

**UPDATE1** I add a kepler visualisation of the problem, so you can see: from this bunch of points I only need to get one for one intersection with traffic light

**UPDATE2** I made this example df bit bigger, so with kepler visualisation you can see at least 2 gps-points per one intersection (and I have to deal like with avg 50 gps-points for 1 intersection). Here’s code for kepler visualization

```
from keplergl import KeplerGl
map_exmpl=KeplerGl(height=800, width=600)
# Create a gepdataframe
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.longitude, df.latitude))
map_exmpl.add_data(data=gdf)
map_exmpl
```