Human activity recognition

Description

This example is about change points detection for a human activity recognition task.

We use “WISDM Smartphone and Smartwatch Activity and Biometrics Dataset” [1, 2], prepared by the Wireless Sensor Data Mining (WISDM) Lab in the Department of Computer and Information Science of Fordham University. The dataset includes data from the accelerometer and gyroscope sensors of a smartphone and smartwatch collected as 51 subjects performed 18 diverse activities of daily living. Each activity was performed for 3 minutes, so that each subject contributed 54 minutes of data. These activities include basic ambulation-related activities (e.g., walking, jogging, climbing stairs), hand-based activities of daily living (e.g., brushing teeth, folding clothes), and various eating activities (eating pasta, easting chips).

Each subject had a smartwatch placed on his/her dominant hand and a smartphone in their pocket. The data collection was controlled by a custom-made app that ran on the smartphone and smartwatch. The sensor data was collected at a rate of 20 Hz (i.e., every 50ms) from the accelerometer and gyroscope on both the smartphone and smartwatch, yielding four total sensors.

  • [1] Weiss, Gary. (2019). WISDM Smartphone and Smartwatch Activity and Biometrics Dataset . UCI Machine Learning Repository. [link]

  • [2] Weiss, Gary & Yoneda, Kenichi & Hayajneh, Thaier. (2019). Smartphone and Smartwatch-Based Biometrics Using Activities of Daily Living. IEEE Access. PP. 1-1. [DOI].

Data download

The dataset is stored in UCI Machine Learning Repository and available for download via this link.

#!wget -q https://archive.ics.uci.edu/ml/machine-learning-databases/00507/wisdm-dataset.zip
#!unzip -o -q wisdm-dataset.zip

In this example, we use data from a smartwatch accelerometer collected from one subject. The data file contains the following columns:

  • id: uniquely identifies the subject. Rand: 1600 - 1650;
  • activity: identifies a specific activity. Range: A-S (no “N” value);
  • timestamp: Linux time;
  • x: sensor value for x axis. May be positive or negative;
  • y: same as x but for y axis;
  • z: same as x but for z axis;

Let’s download the dataset and read a sample.

# import of basic libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import roerich
# helping function for reading the data
def remove_char(fname):
    with open(fname, "r") as f:
        flist = f.readlines()
        flist = [s.replace(';', '') for s in flist]
    with open(fname, "w") as f:
        f.writelines(flist)
# path to a file in the dataset
data_path = "wisdm-dataset/raw/watch/accel/data_1600_accel_watch.txt"

# remove ';' from the file
remove_char(data_path)

# read the data
cols = ["id", "activity", "timestamp", "x", "y", "z"]
data = pd.read_csv(data_path, names=cols)
data.head()
id activity timestamp x y z
0 1600 A 90426708196641 7.091625 -0.591667 8.195502
1 1600 A 90426757696641 4.972757 -0.158317 6.696732
2 1600 A 90426807196641 3.253720 -0.191835 6.107758
3 1600 A 90426856696641 2.801216 -0.155922 5.997625
4 1600 A 90426906196641 3.770868 -1.051354 7.731027

Preprocessing

We will use only sensor measurements and activity labels. So, let’s define the input matrix X and the label vector y.

X = data[['x', 'y', 'z']].values
y = data['activity'].values

However, we do not need the activity labels by itself, but moments of the activity changes. We consider them as change points of the signal, and our goal is to detect them.

# transform activity labels into change point positions
def get_true_cpds(y):
    cps_true = []
    for i in range(1, len(y)):
        if y[i] != y[i-1]:
            cps_true.append(i)
    return np.array(cps_true)
# get true change point positions
cps_true = get_true_cpds(y)
# visualization
roerich.display(X, cps_true, score=None, cps_pred=None)

Then, we preprocess the time series using StandardScaler. It is not mandatory in general case, but sometimes can be useful :). Besides this, we add some noise to our signal. This hack is a regularization technique that helps to reduce sensitivity of the change point detection methods. It decreases the number of change points within one activity.

from sklearn.preprocessing import StandardScaler

# scale the signal
ss = StandardScaler().fit(X)
X_ss = ss.transform(X)

# add noise for regularization
X_ss += np.random.normal(0, 0.1, X_ss.shape)

Change point detection

We use ChangePointDetectionClassifier method to estimate switches between different regimes of the time series. Then, the found change points can be used for the sensor signal segmentation into different human activities.

from roerich.algorithms import ChangePointDetectionClassifier
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

# sklearn-like binary classifier
clf = QuadraticDiscriminantAnalysis()

# change points detection
cpd = ChangePointDetectionClassifier(base_classifier=clf, metric='KL_sym', periods=1, 
                                     window_size=1000, step=10, n_runs=1)
score, cps_pred = cpd.predict(X_ss)

# visualization
roerich.display(X_ss, cps_true, score, cps_pred)

Quality metrics

Finally, we calculate Precision and Recall metrics to measure quality of the detected change points. In addition, we plot Precision-Recall curve using different thresholds for the detection score.

from roerich.metrics import precision_recall_scores, precision_recall_curve, auc_score

# precision and recall
precision, recall = precision_recall_scores(cps_true, cps_pred, window=50)
print('Precision: ', precision)
print('Recall: ', recall)

# PR curve and AUC
thr, precision, recall = precision_recall_curve(cps_true, cps_pred, score[cps_pred], window=50)
auc = auc_score(thr, precision, recall)
print("PR AUC: ", auc)

# visualization
plt.plot(recall, precision)
plt.scatter(recall, precision)
plt.xlabel("Recall", size=14)
plt.ylabel("Precision", size=14)
plt.grid()
plt.show()
Precision:  0.8947368421052632
Recall:  1.0
PR AUC:  0.9966359092656671