**Horizon 2020, research and innovation programme**

**Marie Sklodowska-Curie grant agreement No 708501**

**Project Title: Forecasting with large datasets: A time varying covariance matrix approach**

The basic aim of this project is to focus on the estimation of large dimensional time varying covariance matrices. In other words, propose a general framework that allow us to estimate links between an increasingly large set of variables (economic or not) and are non-constant, across time. To this end, we address two interesting and realistic features of real datasets which have been barely tackled together in the literature, so far. These are the time variation and the large dimensionality of economic datasets.

Time variation in economic relationships has been largely studied in economics. It can be seen either as abrupt shifts in the assumed generating mechanisms of the variables, or as smooth stochastic or deterministic changes in that. Either way, it can be considered as the result of altering forces such as institutional switching, economic transition, preference fluctuations, policy transformations or technological changes. All these can imply instabilities in the assumed economic relationships. Moreover, in divergent scientific areas such as Biology, Physics and Health sciences, evolutionary arguments make time variation assumption exceptionally conceivable.

Large datasets is nowadays, a key characteristic of human development. For instance, computers are in the middle of most economic transactions. These computer-mediated transactions generate huge amounts of data, that can be analyzed in order to extract information. This is relevant for answering economic policy questions or a key to various scientific discoveries. In large datasets, conventional statistical and econometric techniques such as sample covariance estimation or regression coefficient estimation fail to work consistently due to the dimensionality of the parameters that need to be estimated. For instance, in a linear economic relationship we frequently have *T *observations of a dependent variable (*y*) as a function of many potential predictors (*p *predictors). When the number of predictors *p* is large or larger than the temporal dimension *T*, then a regression with all available covariates becomes extremely problematic if not impossible. Analogously, when our aim is to estimate the large covariance matrix of the *p* predictors, the sample covariance matrix becomes heavily unreliable. It is also particularly computationally demanding since the dimension of the estimated object rises as a square of the dimension of the dataset under analysis. The current literature provides some good answers but only when we assume a constant, across time, covariance matrix, of the true data generating mechanism.

These two aspects of the real datasets are important characteristics and failure to provide a framework that can accommodate these, will certainly result to unreliable scientific discoveries. In economics this implies that the developed models will be insufficient to capture important characteristics of the economy, delivering false or unsuccessful policy suggestions.

We provide a unified framework that can accommodate these aspects of real datasets, with nice theoretical properties. Then, as it is shown, this implies significant improvements, in a wide range of metrics, over the relevant methodologies that currently dominate this literature.

Estimation of time-varying covariance matrices for large datasets

Portfolio selection with large dimensional covariance matrices

Yiannis Dendramis main research areas include topics in theoretical and empirical econometrics, with applications in finance and macroeconomics. These involve the econometric modelling of large datasets, volatility modelling, econometric forecasting and asset pricing. Prior to joining the University of Cyprus, he was a visiting Lecturer and research fellow at the School of Economics and Finance, Queen Mary University of London,. He has also spent extended research visits to the University of Southern California, Los Angeles, and the Kings College, London. His work has been published in Journals such as Journal of Economic Dynamics and Control, Journal of Empirical Finance, and Journal of Forecasting.

Figure 1

In figure 1, Panel A depicts the path of the (i,j)-th element of the true stochastic covariance matrix (solid line) and the estimated path by the time varying, regularized Bickel-Levina estimator (dashed line). Panel B depicts the corresponding paths for the deterministic covariance matrix. In both cases, the true covariance is of dimension 5x5 and sparse. The quality of the proposed estimator seems to be remarkably precise: thresholding uncovers well the path of the zero element in position (i,j)=(5,4), filters out minor correlations of the paths of time varying covariance (see (i,j)=(1,2)) and estimates significant correlations with satisfactory quality (see (i,j)=(1,2), (1,1), (5,5),). For more details, see the paper: Estimation of time-varying covariance matrices for large datasets.

Figure 2

In figure 2 we study in inclusion frequency of a large set of possible regressors, in a linear forecasting model, for two macroeconomic variables. This is reported as the percentage of times that each regressor is included in the forecasting equation. We compare the Lasso/Adaptive Lasso methods, and our sparse testing approach. The results indicate that our proposals lead to smaller and more parsimonious models, which in turn lead to better forecasts, as it is presented in the paper: A regularization approach for estimation and variable selection in high dimensional regression models.

A comprehensive presentation of selected results