Basics
Generate signal
In this example, we consider basic usages for change point detection methods in roerich
library. Let’s generate a signal with different regimes, or, in other words, with different distributions of observations. The goal is to determine all change points that correspond to the distribution changes. We also can consider this task as the signal segmentation, where each segment corresponds to one regime of the signal.
import roerich
import roerich.algorithms
import roerich.metrics
import numpy as np
# generate a time series with change points
X, cps_true = roerich.generate_dataset(period=200, N_tot=2000)
ChangePointDetectionClassifier
The first method we use here is ChangePointDetectionClassifier
. It is based on binary classifiers in machine learning. A classifier takes two parts of the signal with window_size
observations and separates them into two classes. Then, it uses predictions to estimate the probability density ratio for these observations and calculate the change point score between the windows. The method refits the classifier on each new pair of windows.
from roerich.algorithms import ChangePointDetectionClassifier
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
# sklearn-like binary classifier
clf = QuadraticDiscriminantAnalysis()
# change points detection
cpd = ChangePointDetectionClassifier(base_classifier=clf, metric='KL_sym', periods=1,
window_size=100, step=1, n_runs=1)
score, cps_pred = cpd.predict(X)
# visualization
roerich.display(X, cps_true, score, cps_pred)
Now, let’s measure the quality of change point detection using Precision and Recall metrics.
# metrics
precision, recall = roerich.metrics.precision_recall_scores(cps_true, cps_pred, window=20)
auc = roerich.metrics.pr_auc(cps_true, cps_pred, score[cps_pred], window=20)
print('Precision: ', precision)
print('Recall: ', recall)
print('PR AUC: ', auc)
The ChangePointDetectionClassifier
method can take any sklearn-like binary classifier. For example, you use your favorite Neural Network or Gradient Boosting over Decision Trees algorithm. The following example uses a shallow Neural Network implemented in PyTroch
as a base classifier for change point detection.
# pytorch NN classifier
clf = roerich.algorithms.NNClassifier(n_hidden=10, n_epochs=10, batch_size=64, lr=0.1, l2=0.0)
# detection
cpd = ChangePointDetectionClassifier(base_classifier=clf, metric='KL_sym', periods=1,
window_size=100, step=1, n_runs=1)
score, cps_pred = cpd.predict(X)
cps_pred
ChangePointDetectionRuLSIF
Similarly, ChangePointDetectionRuLSIF
method also estimates probability density ratio between two windows of the signal. However, it does it using regression models with the RuLSIF loss function. They directly predict the ratios without learning individual distributions. Roerich
provides two following RuLSIF models. NNRuLSIFRegressor
model is a shallow Neural Network implemented in PyTorch. And GBDTRuLSIFRegressor
is a Gradient Boosting over Regression Trees. The following examples show how to use them for change point detection.
from roerich.algorithms import NNRuLSIFRegressor, ChangePointDetectionRuLSIF
# pytorch NN regressor with RuLSIF loss function
reg = NNRuLSIFRegressor(n_hidden=10, n_epochs=10, batch_size=64,
lr=0.1, l2=0.0, alpha=0.05)
# detection
cpd = ChangePointDetectionRuLSIF(reg, metric='PE', periods=1,
window_size=100, step=1, n_runs=1)
score, cps_pred = cpd.predict(X)
# visualization
roerich.display(X, cps_true, score, cps_pred)
from roerich.algorithms import GBDTRuLSIFRegressor
# regressor with RuLSIF loss function
reg = GBDTRuLSIFRegressor(n_estimators=10, max_depth=2)
# detection
cpd = ChangePointDetectionRuLSIF(reg, metric='PE', periods=1,
window_size=100, step=1, n_runs=1)
score, cps_pred = cpd.predict(X)
cps_pred
OnlineNNClassifier
OnlineNNClassifier
is based on the same principles as the previous methods. But it uses a single Neural Network to detect change points in the whole time series. The network scans the signal point by point (window_size=1
) and updates its weights online. It estimates the probability density ratio between distant (lag_size=100
) observations of the signal. In result, the method is faster than previous algorithms, because it fits the network online and reuses it from previous iterations.
from roerich.algorithms import OnlineNNClassifier
# detection
cpd = OnlineNNClassifier(net='default', scaler="default", metric="KL_sym",
periods=1, window_size=1, lag_size=100, step=1,
n_epochs=10, lr=0.1, lam=0.0001, optimizer="Adam")
score, cps_pred = cpd.predict(X)
# visualization
roerich.display(X, cps_true, score, cps_pred)
OnlineNNRuLSIF
OnlineNNRuLSIF
is the same as OnlineNNClassifier
, but uses Neural Network regression model with the RuLSIF loss function fitted online.
from roerich.algorithms import OnlineNNRuLSIF
cpd = OnlineNNRuLSIF(alpha=0.05, net='default', scaler="default",
periods=1, window_size=1, lag_size=100, step=1, n_epochs=10,
lr=0.1, lam=0.0001, optimizer="Adam")
score, cps_pred = cpd.predict(X)
cps_pred