Nousot logo

Data Exploration

Tool to learn more about time series data using statistical checks, algorithms, and breakdowns.

Decomposition

Decomposition is very useful for getting a quick understanding of the main components of most, but not all, time series. Decomposition is a process that takes your selected time series and breaks it down into three components: trend, seasonality, and residual. The three components can be combined with eachother by adding each of the three values at each point in time to get back to the original series.

It is important to note that the three series returned as a result of the decomposition are not neccessarily true or correct. They are simply the best guess given by the algorithm implemented by statsmodels decompose.

  • The trend represents long term patterns, often a basic pattern such as overall growth or decline of a market.
  • The seasonal component gives insight into repeating patterns in the data. Most business series show a strong weekday seasonality, with spikes or drops on the weekends. Usually an annual seasonality, across winter to summer, also exists. This section allows that pattern to be viewed more closely.
  • The residual is everything the trend and seasonal sections cannot account for. A bigger residual is usually a sign of a more-difficult-to-forecast time series, and can sometimes be used to suggest useful regressors if a user is able to discern a pattern to the residuals.
Effect of decomposing a time series into linear components

Autocorrelation

Autocorrelation is an analysis of how closely linked a particular time is to a previous record, a lag, in the past. Autocorrelations can reveal hidden patterns in time series. They are also useful for understanding and anticipating outcomes. If a user notices a strong positive autocorrelation with 7 days previously, they know that generally they can anticipate how a series will perform by looking at what happened 7 days previously. This is the basis of autoregressive models like ARIMA. Unfortunately, most real world data does not strictly obey a single lag autocorrelation.

Chart showing autocorrelation at different lags

Data Distribution

The distribution of a data is a histogram. There is no right or wrong distribution of the data. Some models will prefer data that is shaped more like a standard normal distribution, with the most data clustered as a hill in the center of the plot, but it is not an absolute requirement. Mostly a distribution is useful for catching potential errors in the data. If an unusually large grouping of very large or very small values is present, these are often worth further inspection.

Histogram chart showing the distribution of a time series' data points