7 min read
Closet Indexing Analysis: Python Snippets for Assessing Active Managers
Use these code snippets to investigate active management skill.
Actively managed funds typically charge higher fees than index funds based on the premise that the managers’—and their investment teams’—efforts generate excess returns or alpha.
How well do active managers deliver on that premise?
Investment style analysis can assess the impact of manager decisions. Copy and paste for a deep dive into closet indexing and market timing ability.
For more real-life case studies and examples, download the style analysis guide.
How to Detect Potential Closet Indexing
Closet indexing, or mimicking an index under the guise of active management, is a misleading practice that can cost investors in the long run.
Sometimes, active funds may successfully generate alpha relative to the broad market or even the Fama-French 3 Factor model. However, it’s possible that the “alpha” might be exposure to one of these priced factors. Investors could achieve the same results with a low-cost, tax-friendly ETF without the added expenses of active management.
Style analysis can help uncover potential closet indexing. Implement the statistical tests below. We’ll use the Morningstar Python package as our data source.
The Morningstar data Python package gives data scientists, quants, and engineers convenient access to Morningstar data and the flexibility to use it in their preferred coding environments. To use the Python package, start by logging into Analytics Lab to retrieve your authentication token. Access full instructions on the developer site.
INPUT
import morningstar_data as md
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
Let’s focus on actively managed mutual funds. To make the analysis manageable, limit the dataset to equity funds that:
Have at least USD 1 billion in total assets.
Have more than 5 years of live track record.
Charge fees higher than their Morningstar Category average.
Are both domiciled and invest in the United States, exclusively.
Finally, limit the search to only one share class per fund. With this set of conditions, or filters, you can formulate the universe as a Search Criterion. For easier analysis, you could also use Morningstar Direct’s visual interface to search for relevant funds.
INPUT
criteria = {
"universeId": "FO", # Universe is limited to Open-end funds
"subUniverseId": "",
"subUniverseName": "",
"securityStatus": "activeonly", # Surviving funds only
"useDefinedPrimary": True, # Only one share class per fund
"criteria": [
{"relation": "", "field": "OF00C", "operator": "=", "value": "0"}, # Index Fund = False
{"relation": "AND", "field": "OF009", "operator": ">", "value": "1000000000"}, # Fund Size USD >= $1 billion
{"relation": "AND", "field": "LS017", "operator": "=", "value": "USA"}, # Domicile is USA
{"relation": "AND", "field": "OS05P", "operator": ">", "value": "CAT AVG"}, # Prospectus Net Expense Ratio higher than category average
{"relation": "AND", "field": "OF035", "operator": "=", "value": "$BCG$EQUTY"}, # Broad asset class is equity
{"relation": "AND", "field": "OS00F", "operator": "<=", "value": "04/30/2019"},# Inception on or before Apr 30, 2019
{"relation": "AND", "field": "OS63Z", "operator": "=", "value": "AREA$$$NAU"}, # Area of investment is USA i.e., non-international funds
]
}
INPUT
fund_list = md.direct.get_investment_data(investments=criteria, data_points=[{"datapointId": "OS01W"}, {"datapointId": "OS00I"}, {"datapointId": "OS385"}])
fund_list.head(3)
OUTPUT
INPUT
len(fund_list)
OUTPUT
394
Now that you’ve identified nearly 400 candidate funds, obtain five years of monthly returns for these funds to use in the regression models. This snippet uses the Python package.
INPUT
%time
fund_returns_raw = md.direct.get_returns(
investments=fund_list['SecId'].tolist(),
start_date='2019-06-30',
end_date='2024-05-31'
)
fund_returns_raw.head(3)
OUTPUT
Pivot the returns into a wide format.
INPUT
fund_returns_wide = (
fund_returns_raw
.pivot(
index='Date',
columns='Id',
values='Monthly Return'
)
.div(100)
)
fund_returns_wide.head(3)
OUTPUT
3 rows x 394 columns
Now load the factor premia. This snippet uses premia from the Morningstar North America Standard Risk Model. The model goes beyond the Fama-French alternative to consider additional factors like:
Liquidity, or how well a fund can meet cash flow demands.
Volatility, or variations in returns.
Yield, or the income generated by an investment.
Quality. High-quality firms have high profitability and low financial leverage.
INPUT
mstar_premia_raw = pd.read_csv('mstar-NA-premia.csv')
mstar_premia_raw.head(3)
OUTPUT
Format to set the date as an index and decimalize by dividing by 100.
INPUT
mstar_premia = (
mstar_premia_raw
.assign(Date=lambda x: pd.to_datetime(x.Date))
.set_index('Date')
.div(100)
OUTPUT
With all the information loaded, focus on experimentation. Let’s perform a simple test.
Train a multivariate linear regression model with the fund returns on the LHS and the factor model on the RHS.
Observe the goodness of fit of the model via its F-statistic.
Note the R-squared of all well-fit models.
Funds with very high R-squared values are those where the entire variation in returns can be explained by systematic risk factors. Since such exposure should be cheap, investors shouldn’t have to pay high fees for these funds.
First, create three empty lists to store funds.
INPUT
rsq_95 = []
rsq_98 = []
rsq_99 = []
Fetch the available fund return dates, available premia dates, common dates, and risk-free rates.
INPUT
fr_idx = fund_returns_wide.index
ms_idx = mstar_premia.index
cd = fr_idx.intersection(ms_idx)
rfr = mstar_premia.loc[cd, 'Cash']
And create the excess returns dataframe.
INPUT
fund_returns_excess = fund_returns_wide.sub(rfr, axis=0)
Now create a combined dataset.
INPUT
ms_dataset = fund_returns_excess.join(mstar_premia, how='inner', validate='1:1')
ms_dataset = ms_dataset.dropna(axis=1)
Perform the test iteratively and store the results in the lists.
INPUT
mdc = ms_dataset.columns
frc = fund_returns_excess.columns
valid_funds = list(set(mdc) & set(frc))
model_fit = []
formula_rhs = "Excess + Size + Style + Yield + Volatility + Momentum + Quality + Liquidity"
for fund in tqdm(valid_funds):
model = smf.ols(formula=f"{fund} ~ {formula_rhs}", data=ms_dataset).fit()
# Only for well fitted models, check R-squared
if model.f_pvalue < 0.01:
model_fit.append(fund)
# Add funds to lists depending upon R-squared value
if model.rsquared > 0.99:
rsq_99.append(fund)
elif model.rsquared > 0.98:
rsq_98.append(fund)
elif model.rsquared > 0.95:
rsq_95.append(fund)
else:
pass
else:
continue
INPUT
len(model_fit)/len(valid_funds)
OUTPUT
1.0
Note that 100% of the valid funds that we tried to fit using the regression model resulted in good fits.
Now visualize the findings.
INPUT
labels = '99%', '98%', '95%', 'N.M.'
sizes = [len(rsq_99), len(rsq_98), len(rsq_95), len(model_fit) - len(rsq_99) - len(rsq_98) - len(rsq_95)]
fig, ax = plt.subplots()
explode = (.1, .1, 0, 0)
ax.pie(sizes, labels=labels, explode=explode, colors=plt.get_cmap("tab20").colors[:4],autopct='%1.1f%%')
plt.show()
OUTPUT
Only 1.8% of the valid funds reviewed are potentially closet indexing. While more than 95% of the variation in returns of another 39% of funds can be explained by the factors, the threshold allows sufficient room for some active management.
How to Assess Market Timing Ability
Managers may claim they can time the market to the upside and shield against the downside. If true, that’s worth paying for. Here’s how to test this claim.
Treynor and Mazuy proposed one approach that closely resembles returns-based style analysis. (1)
This method regresses a fund’s excess returns on both the excess returns of the market and on its squared form. In a given time period 𝑡, for a fund 𝑓, market proxy 𝑚 and risk-free rate 𝑟𝑐,𝑡, the model is given by:
rf,t−rc,t=𝛼+𝛽(rm,t−rc,t)+𝛾(rm,t−rc,t)2+𝜖𝑡
A statistically significant, positive 𝛾 indicates market timing ability. A statistically significant negative 𝛾 would indicate that the manager times the market in the wrong direction.
The data set needed for this analysis is the same as above. In fact, for this model, you don’t need any factors beyond the excess market returns.
Start by squaring the excess returns.
INPUT
ms_dataset['Excess_sq'] = ms_dataset['Excess'] ** 2
Create lists to store results.
INPUT
good_timers = []
bad_timers = []
no_ability = []
Next, fit the regression models.
INPUT
formula_rhs = "Excess + Excess_sq"
model_fit = []
for fund in tqdm(valid_funds):
model = smf.ols(formula=f"{fund} ~ {formula_rhs}", data=ms_dataset).fit()
# Only for well fitted models, evaluate timing
if model.f_pvalue < 0.05:
model_fit.append(fund)
# Evaluate only highly significant results
if model.pvalues.loc['Excess_sq'] < 0.05:
if model.params.loc['Excess_sq'] > 0:
good_timers.append(fund)
else:
bad_timers.append(fund)
else:
no_ability.append(fund)
else:
continue
Here’s how to visualize the findings.
INPUT
abels = 'Good Timing', 'Bad Timing', 'No Ability'
sizes = [len(good_timers), len(bad_timers), len(no_ability)]
fig, ax = plt.subplots()
explode = (.1, .1, 0)
ax.pie(sizes, labels=labels, explode=explode, colors=plt.get_cmap("tab20").colors[:4],autopct='%1.1f%%')
plt.show()
OUTPUT
A poor result for proponents of active management.
According to this analysis, 94% of management teams have no market timing ability. The remaining 6% have significant market timing ability, but in the wrong direction. These funds increased their exposure to the market when it was declining and reduced market exposure during periods of growth.
This result would suggest that even if investors wish to buy expensive actively managed funds to access managerial skill, they should know that market timing isn’t a significant component of that skill.
While these results are valid, please supplement them with deeper dives into holdings and other analyses before drawing conclusions about management strategies. One option is to perform rolling regressions over time and examine whether the results are consistent through time.
Unlock More Code Snippets for Fund Evaluation
With these code snippets, investment teams can make informed decisions about fund managers. These tips should give technical users more flexibility to customize analysis and manipulate data their way.
Looking for more ways to use Python for style analysis? The complete guide includes additional case studies, examples, and code snippets.
You might also be interested in...
(1) Treynor J.L. and J. Mazuy “Can Mutual Funds Outguess the Market?” Harvard Business Review, Vol. 44, No. 4, pp. 131 – 136, 1966.