General Overview
Data Analysis is a general term for a group of tools and methods that have an aim to extract useful information from the available data set. This relevant information can be generally defined as an answer to the following two questions:
-
"what has happened in past?"
-
"what will happen in future?"
Depending on case, reasons and circumstances, under which this specific event took or not took place in past, can also be extracted from the same data set. For the prediction, future reasons and circumstances should be given beforehand.

Steps in Data Analysis
The whole process of the Data Analysis can be divided into the following parts: Pre-processing, Aggregation, Modelling, Prediction, and Visualisation:
-
Data Access considers gathering the data from a database;
-
Pre-processing consists of the data inspection, plausibility check, cleaning and sampling. These steps simplify the future data analysis and increase its quality;
-
Aggregation (optional step) corresponds to the basic level of the data analysis, where the aggregated values (e.g. sum, mean, median) and other properties (e.g. standard deviation) of a single variable trend are calculated;
-
Modeling (optional step) takes place when 2 or more variable trends are available, and concerns the following methods:
-
common statistical methods, which has an aim to determine differences and relationships between these trends (more information can be found here):
-
Differences - T-test, F-test, Binomial test, ANOVA, Sign test, Friedman test, Wilcoxon test, Chi-square test;
-
Relationships - Simple, Multiple and Logical regressions, Correlation, Chi-square contingency;
-
-
Grouping of variables (clustering) and reduction of variables (factor analysis);
-
Machine learning algorithms (neural networks, self-organizing maps);
-
-
Prediction (optional step) is aimed to forecast future events using already existing components from Aggregation and Modeling steps;
-
Visualization is a set of tools and methods with, which the results of a fulfilled data analysis are presented. Results can be presented in forms of tables, bar, chart, pie, 2D- and 3D-scatter and line plots and others.

Quality
Unfortunately, there is no clear definition of the term "quality" in field of data analysis. Amount and relevance of the extracted useful information from the data depends purely on the skills of data scientist. However, usually data analysis leads to the following results:
-
find out typical behavior of whole data structure;
-
understand relations between variables;
-
detect outliers and unusual phenomenas;
-
prognose future behavior.
Software
Till current time I has been working with the following software:
-
Matlab
-
Tableau
-
Eclipse BIRT
-
R Studio
-
Python
Tasks for private customers will be fulfilled using R and Python. These software are commonly used for data analysis and have following advantages:
-
-
an effective data handling and storage facility;
-
a suite of operators for calculations on arrays, in particular, matrices;
-
a large, coherent, integrated collection of tools for statistical analysis and modeling;
-
graphical facilities for data analysis and display either on-screen or on hardcopy;
-
large choice of libraries concerned machine learning and artificial intelligence.
-
Experience
During my 3 years of working experience in data analytics I have fulfilled the following tasks:
-
Creation of the k-means clustering procedure for the BIRT environment, Zug (service project);
-
Creation of the interpolated large-scale statistical model for calculation of air temperature for a random location in Switzerland (research project);
-
Investigation of the systems' behavior in an administrative building using energy data and statistical methods, Aarau (research project);
-
Investigation of the systems' behavior in a passive house using energy data and clustering procedures, Schaffhausen (research project);
-
Analysis of the schools' and governmental buildings' consumption using energy data and clustering procedures, Zurich (research project);
-
training and testing of neural networks using simulated data for future clustering of time trends, Lucerne (research project);