WG3 – Transparency into Investment Product Performance for Clients

Active investment products collectively are often unable to offer outperformance relative to passive products like ETFs (Exchange Traded Funds), that are cheaper in terms of fees because they simply replicate broad market indices. Especially so-called “smart beta” strategies offered by banks have systematically underperformed. An empirical analysis of Suhonen et al. (2016) demonstrated an endemic “overfitting” of banks’ investment strategies in their development phase and their significant underperformance after they go live. Lopez de Prado and Lewis (2018) attribute this effect to a “proliferation of false discoveries” about sources of investment performance and calls it “the greatest threat faced by finance as an industry and an academic discipline”. Institutional investors use the experience and expertise of specialized investment consultant companies like Mercer, Willis Towers Watson and Siglo to assess potential investments, but private clients and smaller investors do not have access to this. Academic statistical publications offer methods to quantify the overfitting problem, but these methods require data about “failed trials” only available to the product vendors.

As a consequence of the ex-ante difficulty for investors to screen active investment managers and the ex-post underperformance of actively managed investment funds relative to passive investment strategies, the former have experienced a substantial outflow of assets under management in favour of the latter. This has led to a substantial reduction in expensive independent research activities, which only active funds can afford. Active managers publicly warn that this can result in a decrease in market efficiency.

Description of the Challenge (Main Aim)

The investment challenge for clients is the decision between passively and actively managed products of both asset managers and banks. This decision however requires data and methods for evaluating investment performance, which give rise to an analytic and a data challenge, respectively. 

Starting from the data challenge, this arises because judging the risk-adjusted performance of the products requires estimating their risk as well as their (expected) return. This requires long time series of returns on the products, otherwise valid statistical inferences on risk-adjusted performance would be impossible. This however poses a data availability problem because, while investment fund time series are widely available as their NAVs must be published with regular frequency, other financial products usually have a fixed expiry date and are invisible after their expiry. To circumvent this problem, the calculation of indices that replicate the payoffs of the products over time is required. This, in turn, requires long time series of data on prices of underlying assets and market conditions (risk factors), which must be collected and stored for each product. Some of the required information, such as execution costs for the implementation of the strategies underlying the products, is also not readily observable and must be modelled or otherwise inferred.

In terms of methodology, one key challenge is that the “failed trials” produced during the development process are not known to any other party who was not involved in this development process. In these situations, the analytical tools reviewed by Bailey and Lopez de Prado (2014), Bailey et al. (2015) and Lopez de Prado and Lewis (2018), among others, are not applicable. In contrast to the product development process in the pharmaceutical industry, there is also no regulation on how to set up a backtest, which also requires the choice of an appropriate benchmark, what data needs to be stored to ensure replicability, how to test parameter sensitivity and how to deal with failed trials. Hence, theanalytic challenge is to provide industry and regulators with guidance on how to deal with these methodological issues and how formulate the necessary regulation, respectively.

Overall, the scientific challenge and main aim for this WG is thus to propose consistent and reliable methods, together with the necessary data, for choosing investment products ex-ante and evaluating ex-post their performance.

Progress beyond the state-of-the-art

First, this WG will address the data availability challenge. The WG will collect time series data on investment funds, insurance-linked investment products and banks’ products, their underlying assets and relevant market conditions (risk factors). This will be done with the objective of directly estimate the risk-adjusted performance, whether directly or, when this will not possible (please see discussion of the challenge), by first calculating indices that replicate the payoffs of the products and estimating their execution and liquidation costs over time. Some of the data will be directly collected from exchanges and websites and it will be possible to freely exchange it within the network. Other parts of the data will be protected by IP from data vendors. Therefore, a strategy to develop algorithms, distribute these within the working group and then run them on the de-centrally stored data will be developed.

Despite this data collection effort, the WG will face a limited data availability problem for some products and/or markets and performance attribution factors. To mitigate this problem, the WG will draw from methods developed by WG2 and contribute to their development.

Once the data availability challenge is addressed, the analytic challenge will be tackled. Researchers in the WG, like in many real-world settings, will not have access to the “failed trials” of the product developers, which implies that it will not be possible to apply the methods from the literature reviewed and systematized by Bailey and Lopez de Prado (2014), Bailey et al. (2015) and Lopez de Prado and Lewis (2018). A mitigation measure will be to simulate the development process of rule-based financial products by using generic versions of published factors and then to derive tweaked versions of these implementations until the performance characteristics of published backtested time series of real investment products are matched. The WG will work to automatize this “tweaking” process with machine learning approaches. The artificially generated “failed trials” then serve as input to use the published methodologies of Bailey and Lopez des Prado to quantify the “overfitting bias”.


The main objectives of the working group are:

  1. Pruning and improvement of the vast array of performance attribution models by contributing to the development of methodologies for reducing the false discovery rate in financial research and applied financial investment management (long-term scientific impact)
  2. Disseminate to the public and share with regulators the results on the previous two objectives, through presentations at public conferences and an active contact with European regulators.
  3. Creation of the first European platform comparing the out-of-sample performance of banks’ investment products, insurance-linked investment products and asset management products available to the general public (industry impact).