
Time Series Clustering Toolbox - Excel Web Add-In
This Excel Web Add-In aims to assist users with their data analytics by completing a cluster analysis - the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
In the case of current Excel Web Add-In, the algorithm finds similar vectors in a selected Excel range on a user's worksheet and creates a new worksheet with already clustered vectors and additional information. The whole Add-In workflow can be divided into three steps:
-
gathering required data by prompting a user to fill specific information into the Add-In dialog box - provide a reference to a range of values on an Excel worksheet, provide a number of clusters, choose what type of output information should be displayed by postprocessing;
-
data preprocessing - checking the non-numerical numbers in a referenced range, filling gaps using linear interpolation between known data points;
-
clustering of the data using the k-means algorithm;
-
data postprocessing - providing information general and detailed information about the results of the clustering procedure.
The main advantage of this Add-In concerns the fact that it can work with the time series and its timestamps - automatically splitting a time series into vectors using a predefined time interval.
Note: in order to run this Add-In, a user needs to be signed in using your Microsoft Account and have a valid license.
Input data
When the Add-in is started, the user is prompted to choose one of the following types of input data for further analysis: "Input is a 2-dimensional range" or "Inputs are two vectors of timestamps and values" (picture on the right).
-
"Input is a 2-dimensional range" means that a user will be prompted to set a reference to a non-empty range and choose the type of intervals ("Rows" or "Columns)". Add-In will group columns or rows of this referenced range depending on the chosen option.
-
"Inputs are two vectors of timestamps and values" is specially designed to cluster time series. In this case, a user will be prompted to set references to a non-empty range with values, a non-empty range with timestamps and provide time interval. Add-In will split this referenced range with values according to timestamp range and given time interval, and further group this split vectors.

Input data is a "2-dimensional range"
Step 1
Step 2
Step 1
On this step, a user is prompted to set a reference to a non-empty range and choose the type of interval. The referenced range should have at least 3 columns or 3 rows depending on the chosen option. For the user's convenience, the button "Use selected Range" was made to set a reference to the selected worksheet range automatically.
Step 2
On this step, a user is prompted to set a number of clusters and (optional, only if exists) set a reference to a range with interval names. The number of clusters should be greater than 2 and lower than a number of columns (rows) of the range given in "Step 1". If the toggle "Fixed Value" is set to "No", then the clustering procedure will find an optimal number of clusters in a range from 2 to the given value in field "Number of Clusters"; if the toggle is set to "Yes", then the clustering procedure will try to find exactly given number of clusters in input range.




Inputs are two vectors of timestamps and values
Step 1
Step 2
Step 1
On this step, a user is prompted to set a reference to a range with timestamps and set the interval length. The referenced range should have at least 3 columns or 3 rows. Interval length should be three times smaller than the time period given in the referenced range.
Step 2
On this step, a user is prompted to set a reference to a range with values and set a number of clusters. The number of clusters should be greater than 2. Range with values should have the same length as referenced in "Step 1" range with timestamps. If the toggle "Fixed Value" is set to "No", then the clustering procedure will find an optimal number of clusters in a range from 2 to the given value in field "Number of Clusters"; if the toggle is set to "Yes", then the clustering procedure will try to find exactly given number of clusters in input range.




Output parameters
Step 3 ("Almost Done!")
On this step, a user is prompted to choose how the clustering results should be displayed:
-
"General Info"
-
"Centroids"
-
"Detailed Info"
Explanation of each option will be given in the chapter below.
A user is also asked to provide the name of the worksheet, where the results will be written.
Step 4 ("Recluster (optional)")
After the clustering procedure is done, a user can run the clustering procedure again with a different number of clusters and (or) different output information and write the results to a different worksheet.


Understanding the results
The output results of clustering procedure consist of three parts:
-
General Information - shows how many intervals each cluster has; shows exactly what interval to which cluster belongs;
-
Centroids - shows the mean interval per cluster by averaging the intervals, which to this cluster belong
-
Detailed Infomation - shows exactly what interval to which cluster belongs, also showing all the input data, but already sorted according to cluster affiliation.
