# %% [markdown]
# # Using the DLRUTDataset
#
# This example shall give an overview of the methods and attributes that are available in the `DLRUTDataset` class.
#
# ## Load trajectory data
# At first, we need to load the trajectory data of the dataset.
# %%
from tasi.dlr import DLRTrajectoryDataset
from tasi.dlr.dataset import DLRUTDatasetManager, DLRUTVersion

dataset = DLRUTDatasetManager(DLRUTVersion.latest)
dataset.load()

ds = DLRTrajectoryDataset.from_csv(dataset.trajectory()[50])

# %% [markdown]
# ## Attributes of the dataset
# There are several attributes available to get information about a dataset. For instance, we can get the interval of a
# dataset via the property
# %%
ds.interval
# %% [markdown]
# or all unique timestamps of it via
# %%
ds.timestamps
# %% [markdown]
# or the ids of all traffic participants in the dataset.
# %%
ds.ids
# %% [markdown]
# ## Filtering
# If you want to look into a short sequence of the overall dataset, you can select specific rows of the overall dataset.
# The `tasi.DLRTrajectoryDataset` provides various ways for this purpose.
#
# ### Time and object
# There are two variants to filter a dataset based on the information on the dataset's `index`. For instance, if you
# want to filter the dataset by an interval, you can utilize the `tasi.DLRTrajectoryDataset.during` method.
# %%
ds.during(ds.timestamps[0], ds.timestamps[10])
# %% [markdown]
# that returns the rows within the given interval.
#
# Another variant to select specific rows of the datasets is by the `id` of a traffic participant. This might be useful
# if you want to take a closer look into the behavior of specific traffic participants. For instance, to filter by the
# second traffic participant in the dataset, we can combine the `tasi.DLRTrajectoryDataset.ids` attribute with
# the `trajectory` method.
# %%
ds.trajectory(ds.ids[1])

# %% [markdown]
# ### Traffic participant properties
#
# There are also methods available that might help to find the relevant information in the dataset. The most straight
# forward option is to use pandas' capability to access specific attributes of the datasets. The available attributes on
# the dataset, are available via the `tasi.DLRTrajectoryDataset.attribute` property.
# %%
ds.attributes
# %% [markdown]
# We can, for instance, access the traffic participants `center` position.
# %%
ds.center
# %% [markdown]
# or the classification propabilities.
# %%
ds.classifications
# %% [markdown]
# We extended these basic capabilities with additional methods, that, for instance, allow to get the most likely class
# by each traffic participant's pose
# %%
ds.most_likely_class(by="pose")
# %% [markdown]
# or by the overall trajectory (the default), i.e. all poses of a traffic participants.
# %%
ds.most_likely_class(by="trajectory")
# %% [markdown]
# This might help to filter the dataset to select only traffic participants that are classified as a
# `car`. To archieve this, we first get the most likely class per trajectory, select the rows having the value 'car' and
# pass their index (the traffic particpant's `id`) into the `tasi.DLRTrajectoryDataset.trajectory` method.
# %%
classification = ds.most_likely_class(by="trajectory")

ds.trajectory(classification[classification == "car"].index)
# %% [markdown]
# You can achieve the same result by directly calling
# %%
ds.cars
# %% [markdown]
# This works similarly for all object classes.
# %%
ds.trucks
# %% [markdown]
# ### Custom filter or transformator
# You can also build your own filter or transformator that may apply to the trajectory or pose level. For this purpose,
# the `TrajectoryDataset.apply` method can be used.
#
# For example, let's assume that you want to analyse the length of the different trajectories within a dataset. This may
# be useful for finding anomalies. In the following, we will count the number of measurements per traffic participant.
# %%
import pandas as pd

tj_length = ds.apply(len, by="trajectory")

# create bins of width 100 measurements and count traffic participants within bins
ds_binned = pd.cut(tj_length, range(0, tj_length.max(), 100))
counts = ds_binned.value_counts().sort_index()

ax = counts.plot(kind="bar")
# %% [markdown]
# If you are instead interested in the length of each trajectory in *meter*, we can utilize `shapely`. To achieve this,
# we convert the `tasi.TrajectoryDataset` to a `tasi.GeoTrajectoryDataset` and gain access to the
# `shapely` feature set. This enables us to use the `length` attribute which is the length of the *geometry*.
# %%
import numpy as np

gds = ds.as_geopandas("center")
gds.set_geometry("center", inplace=True)
tj_length = gds.length

# create bins of width 100 measurements and count traffic participants within bins
ds_binned = pd.cut(tj_length, range(0, np.int32(np.round(tj_length.max())), 10))
counts = ds_binned.value_counts().sort_index()

ax = counts.plot(kind="bar")