Está en la página 1de 12



11 JUN 2015

Activity Detection with MATLAB

Is physical activity an important part of your quest to stay fit? Have you given a thought to
how much time you spend walking or running every week?
Recent studies have shown that nearly 20% of all adults use some form of technology to
track their activities. Do you subscribe to this quantified self movement and do you
analyze your daily activities to gain more insights about yourself?
This blog post illustrates how I used an Android device coupled with machine learning
algorithms provided by MATLAB and Statistics Toolbox to detect my activity in real-time.
Hardware and Software Setup
Some book-keeping before we begin. We used the following components:

A computer running MATLAB R2014b, with the MATLAB Support Package for
Android Sensors, and Statistics Toolbox installed
An Android smartphone with MATLAB Mobile app installed
WiFi or cellular data connection on the smartphone
We connected the Android device to the computer using the MATLAB Connector. To
learn more about doing this, refer to the getting started guide on this page.
We also recommend reading this blog post introducing sensor integration with MATLAB
When we watch a person, it is easy for us to tell what activity they are performing even if we
have never seen them in the past. This is because our brains are already trained to
understand human activities. When viewing the activity, the brain compares it to thousands
of activities it has memorized and pops out the one that matches. Similarly, a computer (or
phone) can identify the activity I am performing based on activities I have trained it to
On a computer, a machine learning algorithm can be used to learn human activities and
detect the activity being performed for the new data that is collected. A detection task such
as this, which involves categorizing data into separate classes is called classification.
Another example of a classification task would be assigning a diagnosis to a patient as
described by presence of certain symptoms.
Applying a classification algorithm to this task involves two steps: training and detection.
The training step builds a model which maps training data to certain categories. The
detection step maps new data to a category.
In my application, I used the acceleration sensor (accelerometer) in my Android phone to
help identify the activity that I am performing. I chose the K-Nearest Neighbor (KNN)
classifier. This is a suitable algorithm for my application because it can detect the activity
very quickly and has good accuracy while working with low dimensional data (a small set of
features). It detects the category to which a new data point belongs to by taking a majority
vote of its closest K neighbors in the training data set.
The process of detecting activities was performed in three steps:
1) Data Collection: I collected the 3-dimensional acceleration data from the accelerometer
on my Android phone.
2) Feature Extraction: I identified and extracted distinct features in the accelerometer data
for each activity that I wanted to detect.
3) Activity Classification: I used the features extracted for the various activities to train the
classifier. The classifier was then used on new accelerometer data to identify the activity
being performed.
Data Collection

While the detection was performed by a classifier, it needed to be trained by a set of known
data points to make it work. MATLAB Mobile, in conjunction with the MATLAB Support
Package for Android Sensors, enabled me to gather data from the devices accelerometer
and send the subsequent measurements to the MATLAB session on my computer.
Once I established a connection with the Android device, I instantiated a mobiledev object
to record sensor data being sent from the device. I then enabled the accelerometer sensor
on the device and started logging measurements through MATLAB.
mobileSensor = mobiledev() % create mobiledev object
mobileSensor.AccelerationSensorEnabled = 1; % enable accelerometer
mobileSensor.start; % start sending data

I was idle for the first 10 seconds after evaluating that last command. Then I stood up and
walked for the next 70 seconds. Having reached the staircase, I climbed downstairs and
sprinted for around 60 seconds. Then I walked for a further 70 seconds, climbed up some
stairs and walked back to my office. To wrap things up, I sat down and remained idle until
the end.
I acquired the 3-dimensional acceleration data that was recorded and visualized it:
[accel, time] = accellog(mobileSensor); % acquire data from logs
plot(time, accel); % plot data

The plot above contains 3-dimensional accelerometer data for all the activities that I
performed: being idle, walking, running, climbing stairs, and going down the stairs. As you
may have noticed, it is not possible to distinguish between each of these activities by
simply looking at this plot. Therefore, I had to identify features that would help identify each
activity and distinguish it from the others.
Feature Extraction
Though the raw accelerometer data for each activity looks similar in the time domain, it
contains unique characteristics that we can use to distinguish between the different
activities. For example, the maximal value of all data points or the number of data points
above a certain threshold. We call these characteristics features. Owing to these features,
we can distinguish and classify activites. There are several features we can consider, for
example: mean, standard deviation, median, variance, maximum, minimum, magnitude of
frequency component etc. However, to perform feature extraction efficiently we need to find
a minimum set of features that can distinguish between the different activities without being
very resource intensive.
Of the several different possible features, I found the following 6 features to be the most

Feature_1: The mean of magnitude data

Feature_2: The squared sum of magnitude data below 25 percentile
Feature_3: The squared sum of magnitude data below 75 percentile
Feature_4: Peak frequency in spectrum of y-axis data below 5 Hz
Feature_5: Number of peaks in spectrum of y-axis data below 5 Hz
Feature_6: Integral of spectrum of y-axis data from 0 to 5 Hz
Note that magnitude data is square root of the sum of squared y-axis and z-axis
acceleration readings. We can safely ignore the x-axis readings because it does not vary
much across the different activities this is due to the orientation of the phone in my
pocket. I extracted these 6 features for each activity from 5 seconds worth of data recorded
for the respective activity. I chose 5 seconds as the length of my window because it was

long enough to provide consistent and stable features for detection. You can play with the
window length but keep in mind that a long window (like a minute) should be avoided
because there is a high probability that activities change during that duration.
After calculating the above 6 features for the different activities
performed as a part of the training procedure, we fed the training algorithm two inputs: the
features and the appropriate response (i.e. the activity the feature denotes). The feature
variable below is calculated by the taking the raw 3-dimensional accelerometer data
corresponding to the period where I was running and calculating the size features
mentioned above. An example:
feature = [30, 15, 7.6, 2.3, 5, 8];
activity = 'running';

Here the feature variable is [Feature_1, Feature_2, Feature_3, Feature_4, Feature_5,

Feature_6]. Each (feature, activity) pair constitutes one training data point. To learn more
about how to obtain these training data points, refer to the recordTrainingData MATLAB
script in the accompanying zip file. The raw accelerometer data of each activity is saved in
a MAT file. I included input prompts between each activity to avoid data collection during
transitions. This ensured that the raw data for each activity was clean and consistent for
feature extraction.
To extract features from the raw data saved in each MAT file, refer to the
extractTrainingFeature MATLAB function in the downloaded zip file. Note that the raw
accelerometer data is sampled at about 200 Hz but could be significantly less due to the
way an Android phone works. In addition, the sampling rate might change across the period
of measurement, thereby making the data non-uniformly sampled. To account for nonuniformly sampled data, I implemented a resampling algorithm. This led to more accurate
feature identification and better classification. In the following plot, the red line with x
marker indicates the raw y-axis accelerometer data which is non-uniformaly sampled and
the blue line with o marker indicates the resampled data.

Note that the first three features from the list above are in the time domain, while the rest
are in the frequency domain. From the following plots we see that the features for different
activities cluster together nicely (for instance, all the red dots on the graph correspond to
the features related to running). This nature of the features (distinct clusters for different
activities) allows us to accurately identify the activity being performed for new
accelerometer data:

Activity Classification

In general, a training algorithm requires many training data points to build a reliable model
for detection. To this extent, I collected over a thousand training data points of each activity
for classifier training.
To begin with, I grouped the features in an array in the following order Walking, Running,
Idling, Climbing upstairs and Going downstairs:
data = [featureWalk; featureRun; featureIdle; featureUp; featureDown];

In the above line of code, featureWalk is a 10006 array of six features calculated using the
raw accelerometer data collected while I was walking. Similarly, featureRun is a 10006
array of six features calculated using the raw accelerometer data collected while I was
running, and so on. Once I had the features for all the activities in the data array, I noticed
that features corresponding to Run were on a relatively larger scale than features
corresponding to either Idle or Walking state. This creates a bias in a feature across
different activities and will affect the capability of the algorithm to accurately detect the
activity being performed for new data (which might be scaled differently). Therefore, I
normalized the values in data to confine the range of values to be between [0, 1]:

The plot above illustrates the raw Feature_1 values calculated across all activities, and also
the normalized version. As is shown, after normalization the values of Feature_1 lie
between [0, 1]. A similar operation was applied to the remaining 5 features as well.
Once the data was normalized and ready to be used, I had to define the response that the
machine learning algorithm has to output when it receives the data array as an input. The

input data and the output response would then be used to teach the machine algorithm how
to classify new data. To build the output response vector, I first assigned an integer for each
activity: -1, 0, 1, 2, 3 for going downstairs, idling, climbing upstairs, walking and running
respectively. Since I need a response for each input feature set, I created a column vector
(containing these integers) of the length of training feature data points of each activity as
the response vector. To make the detected activity easily human readable, I converted the
response vector to a categorical array with values Going Downstairs, Idling, Climbing
Upstairs, Walking, Running and Transition:
Down = -1 * ones(length(featureDown), 1);
Idle = zeros(length(featureIdle), 1);
Up = ones(length(featureUp), 1);
Walk = 2 * ones(length(featureWalk), 1);
Run = 3 * ones(length(featureRun), 1);
responseVector = [Walk; Run; Idle; Up; Down]; % building the output response
valueset = [-1:3, -10];
cateName = {'Going downstairs', 'Idling', 'Climbing upstairs', 'Walking', ...
'Running', 'Transition'};
response = categorical(responseVector, valueset, ...
cateName); % converting to a categorical array

After generating the response array above, I then trained the K-NN algorithm to obtain a
model. To do this, I used the FITCKNN function from Statistics Toolbox; For this application,
after a few trials, I chose K (NumNeighbors property) to be 30 as this provided the
required performance and accuracy for detection.
mdl = fitcknn(data, response);
mdl.NumNeighbors = 30;

Having generated a model using the training data, I wanted to use it on new data from my
phone to validate the detected activity and thereby validate the model. For this, I used the
custom function called extractFeature MATLAB function to calculate the six features of

interest. The calculated features (saved to the newFeature variable) are then used along
with the model to detect the activity being performed:
newFeature = [0.15, 0.28, 0.2, 0.35, 0.65, 0.7]; % features for the new
result = predict(mdl,newFeature); % predicting the activity

I trained the model to distinguish between 5 activities. A natural question would be: what if
the actual activity is different from all 5 activities? The detector will still evaluate the activity
in the current detection window and assign matching low scores to each of the 5 activities.
Also, low scores can occur when a transition from one activity to another happens during
the detection window, which may confuse the predictor. So, instead of reporting the
detection results with low matching scores, I have designed the predictor to report a
transition when the probability of prediction for each class is less than 95%. This rule also
applies when I am transitioning between two recognizable activities, such as from walking
to running. This is reasonable because features are usually not stable for a few consecutive
windows during a transition.
Below is a plot of the raw data from my phone collected for about a minute and the detected
activities. To make it easier to read the detected activities, I plotted the detected activity at
the bottom of the graph below. I have used the following symbols: Transition, x Walking,
Up the stairs, * Running, Walking Downstairs, o Idling:

Note that the machine learning algorithm is trained using raw accelerometer data for
various activities that I performed. If you used the algorithm trained with my data, as shown
above, for detecting activites using accelerometer data that you collected using your phone,
the algorithm might not be very accurate. This is true even if you used the same phone as I
did and placed the phone in the right front pocket of your pant. This is because you might
have a different gait than I do. The measured sensor data also depends on your height,
weight and the distance of the phone from the floor.
To create an activity detector that accurately detects your activites, you will have to start by
collecting multiple accelerometer datasets from your phone for each activity that you would
like to detect. Next, extract the 6 features listed above for each dataset by using the
extractTrainingFeatures MATLAB function. Once you have the features of the training data
extracted, then use this data to train the machine learning algorithm. Finally, use this to
detect the new activities that you are performing.
We are only scratching the surface of whats possible here. This application can be adapted
to any other detection system involving a vehicle (bike or car) or even motion robots. In
addition to the accelerometer, any of the available sensors such as the GPS, gyroscope, or
magnetometers can be used to build an active tracking application. What other ways do you

envision using sensor data from a mobile device such as this to gain insight? What are you
learning from your quantified self? Let us know in the comments below.
Go here to learn more about analysis and visualization capabilities offered by MathWorks
tools. Click the button below to access all the MATLAB code used in this article.

Download Code
If you do not have access to MATLAB and would like to get a trial copy, please click here.
To purchase a copy of MATLAB please click here. If you are a student, then click here to
purchase MATLAB for student use.
I would like to thank Wen Jiang for the development of MATLAB code and data visualization
graphs shared in this article.
Copyright 2015 The MathWorks, Inc.