Research Analysis: Financial Metric Evaluation with IC, Quintile, & Threshold Analysis¶

This document describes the quantitative research methodology used in the QuantAscent quant engine for evaluating factor predictiveness, constructing composite scores, and backtesting strategies.

Who This Document Is For¶

You don't need a statistics degree to use QuantAscent's Research tools — the app handles all the math for you. This document explains what the numbers mean so you can make better decisions about which metrics to include in your strategies. If you're new to quantitative investing, read the plain-language summaries (in blockquotes) at the top of each section and skip the formulas. If you want the full technical detail, it's all here too.

Using the app? See the Screen Guide for a walkthrough of the Research tab and every other screen.

1. Overview & Motivation¶

The research framework answers a fundamental question in quantitative investing:

Given a universe of stocks and a set of observable metrics, which metrics reliably predict future returns, and how should we combine them into a selection strategy?

The pipeline proceeds in stages:

Raw Data Preparation --> Financial Metric Calculation --> Metric Evaluation --> 
Composite Score Generation --> Backtest --> Strategy Optimization

Each stage introduces statistical tools designed to separate genuine predictive signal from noise. The key analyses -- IC ranking, quintile decomposition, threshold sweeps, and filter tests -- each attack this problem from a different angle, providing converging evidence about metric quality.

2. Data Preparation & Forward Returns¶

In plain language: Before we can test which metrics predict stock returns, we need to organize the data. The "metrics matrix" is a big spreadsheet: each row is one stock on one date, and the columns are 120+ financial measurements (like P/E ratio, ROE, debt levels, etc.) plus the stock's actual return over the next quarter. This is the raw material every analysis in this document builds on.

2.1 The Metrics Matrix¶

The foundation of all analysis is a cross-sectional panel (the "metrics matrix"): a tabular dataset where each row represents one (ticker, as_of_date) observation, and columns contain 120+ fundamental, valuation, growth, and technical metrics plus a target variable.

Construction: - Rebalance dates are generated at fixed intervals (default 90 days) across the historical sample (default 2010-2026). - For each rebalance date, metrics are computed for every ticker with sufficient data as of that date. - The result is augmented with categorical company info (sector, industry, country, exchange).

2.2 Forward Return (Target Variable)¶

The dependent variable in all analyses is FutureProfit -- the simple holding-period return over the rebalance window:

\[ \text{FutureProfit} = \frac{P_{t+\Delta}}{P_t} - 1 \]

where: - $P_t$ = closing price on the rebalance (observation) date - $P_{t+\Delta}$ = closing price on the date $\Delta$ days forward (default $\Delta = 90$) - If no price exists on exactly $t + \Delta$, the most recent available price on or before that date is used.

Outlier filtering: Observations with $\text{FutureProfit} \leq -1.0$ or $\geq +1.0$ (i.e., returns exceeding $\pm 100\%$) are excluded from all analyses to prevent extreme values from dominating rank correlations and means.

2.3 Growth Rate Estimation¶

Many metrics involve estimating a growth rate from a time series of historical observations (e.g., FCF over the last 20 quarters). This is done via OLS regression on normalized values:

Given a series of $n$ observations $y_1, y_2, \ldots, y_n$ (ordered newest-first from the data source):

\[ \tilde{y}_i = \frac{y_i}{\bar{y}}, \quad \bar{y} = \frac{1}{n}\sum_{i=1}^n y_i \]

Fit the linear model $\tilde{y}_i = \beta_0 + \beta_1 \cdot i + \varepsilon_i$ via OLS:

\[ \beta_1 = \frac{n \sum x_i \tilde{y}_i - \sum x_i \sum \tilde{y}_i}{n \sum x_i^2 - (\sum x_i)^2} \]

The goodness-of-fit is captured by $R^2$:

\[ R^2 = 1 - \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}} = 1 - \frac{\sum(\tilde{y}_i - \hat{\tilde{y}}_i)^2}{\sum(\tilde{y}_i - \bar{\tilde{y}})^2} \]

A high $R^2$ indicates the growth trend is consistent (not noisy), which is itself a useful signal.

3. Information Coefficient (IC) Analysis¶

In plain language: The IC answers a simple question: "If I rank all stocks by this metric, do the higher-ranked ones actually earn higher returns?" An IC of +0.05 means yes, there's a meaningful positive relationship. An IC near zero means no relationship. A negative IC means the relationship is inverted (lower values predict higher returns). You don't need to understand the math to use the Research screen — just look for metrics with higher |IC| values.

3.1 Definition¶

The Information Coefficient (IC) measures the cross-sectional rank correlation between a metric's values and subsequent realized returns. It answers: "Do stocks ranked highly on this metric tend to have higher future returns?"

We use Spearman's rank correlation rather than Pearson's because: - It captures monotonic (not just linear) relationships. - It is robust to outliers and non-normal distributions. - Financial return distributions are heavy-tailed; rank-based measures are more stable.

3.2 Calculation¶

For a given metric $M$ with valid observations $\{(m_i, r_i)\}_{i=1}^N$ where $m_i$ is the metric value and $r_i = \text{FutureProfit}_i$:

\[ \text{IC} = \rho_s(M, R) = 1 - \frac{6 \sum_{i=1}^N d_i^2}{N(N^2 - 1)} \]

where $d_i = \text{rank}(m_i) - \text{rank}(r_i)$ is the difference in ranks.

In practice, tied ranks are handled appropriately and a two-tailed $p$-value is provided.

Interpretation:

IC Range	Interpretation
> +0.05	Meaningful positive predictive power (higher metric → higher returns)
-0.05 to +0.05	Weak or no predictive power
\< -0.05	Meaningful inverse predictive power (higher metric → higher returns)
In cross-sectional equity research, ICs in the $\pm 0.03$ to $\pm 0.10$ range are typical for useful single factors. An IC of $0.10$ is considered quite strong.

3.3 IC Ranking Scan¶

The IC ranking scan computes IC for every numeric metric in the matrix, then sorts by $|\text{IC}|$ to surface the most predictive factors. Filters applied:

Minimum data coverage: at least 30% of observations must have non-null values for the metric.
Optional pre-filter (e.g., restrict to a sector or market-cap band).
Optional exclusion of dollar-denominated metrics (which may carry size bias).

Output columns: - metric: factor name - IC: Spearman correlation with FutureProfit - abs_IC: $|\text{IC}|$ (for ranking) - p_value: two-tailed significance - n_valid: sample size - direction: "higher is better" or "lower is better"

3.4 Rationale¶

IC analysis is the first pass in factor evaluation. A factor with near-zero IC is unlikely to add value regardless of how it is incorporated. However, IC alone has limitations: - It assumes a linear-in-ranks relationship (a factor could be predictive only in the tails). - It does not reveal the magnitude of return separation. - It is an aggregate measure and may mask regime- or sector-dependent effects.

These limitations motivate the complementary analyses below.

4. Quintile Analysis¶

In plain language: Quintile analysis splits all stocks into five equal groups — from the lowest 20% to the highest 20% — based on a metric, then checks the average return of each group. If Group 5 (highest values) consistently beats Group 1 (lowest values), the metric is probably worth using. If the returns go up smoothly from Group 1 to Group 5, that's even better — it means the relationship is reliable across the whole range, not just at the extremes.

4.1 Motivation¶

While IC summarizes the overall rank relationship in a single number, quintile analysis reveals the shape of the return-metric relationship by grouping stocks into ordered buckets and examining average returns within each.

4.2 Portfolio Formation¶

Given $N$ valid observations of metric $M$, stocks are sorted by $M$ and divided into 5 equal-frequency groups (quintiles):

\[ Q_j = \left\{ i : P_{(j-1) \cdot 20}(M) < m_i \leq P_{j \cdot 20}(M) \right\} \]

for $j \in \{1, 2, 3, 4, 5\}$, where $P_k(M)$ denotes the $k$-th percentile of metric $M$.

$Q_1$ = lowest 20% of metric values
$Q_5$ = highest 20% of metric values

4.3 Return Statistics per Quintile¶

For each quintile $Q_j$, we compute:

\[ \bar{r}_j = \frac{1}{|Q_j|} \sum_{i \in Q_j} r_i \quad \text{(mean return)} \]

\[ \tilde{r}_j = \text{median}(\{r_i : i \in Q_j\}) \quad \text{(median return)} \]

\[ \sigma_j = \sqrt{\frac{1}{|Q_j|-1} \sum_{i \in Q_j} (r_i - \bar{r}_j)^2} \quad \text{(standard deviation)} \]

4.4 Quality Metrics¶

Monotonicity measures whether returns increase (or decrease) consistently from $Q_1$ to $Q_5$:

\[ \text{Monotonicity} = \rho_s\left(\{1,2,3,4,5\}, \{\bar{r}_1, \bar{r}_2, \bar{r}_3, \bar{r}_4, \bar{r}_5\}\right) \]

A value near $+1.0$ indicates a clean, increasing relationship -- the ideal pattern for a long-only factor. A value near $-1.0$ indicates a consistently decreasing relationship (useful if you invert the signal or use it for shorting).

Q5-Q1 Spread measures the maximum return separation:

\[ \text{Spread} = \bar{r}_5 - \bar{r}_1 \]

This represents the theoretical return from going long the top quintile and short the bottom quintile (a standard factor portfolio in academic research).

4.5 Interpretation Guide¶

Pattern	Interpretation
Monotonic increase, positive spread	Strong long factor -- higher metric values predict higher returns
Monotonic decrease, negative spread	Inverse factor -- lower values are better (or use as a short signal)
U-shaped or inverted-U	Non-linear relationship -- consider threshold-based rather than rank-based usage
Flat middle, extreme quintile separation	Tail-driven factor -- only extreme values are predictive
Noisy, no pattern	Metric is not predictive cross-sectionally

5. Threshold Analysis¶

In plain language: Threshold analysis asks a very practical question: "If I only buy stocks where metric X is above (or below) some cutoff, do I beat the market?" It tests many different cutoff values and shows you which ones work best. This is how you turn a research insight into an actual filter for your strategy — for example, discovering that stocks with FCF Yield above 4% tend to outperform.

5.1 Motivation¶

Quintile analysis assumes a rank-ordered relationship. Threshold analysis instead asks: "If I require a metric to exceed a specific value, how do the selected stocks perform?" This is more directly applicable to building binary buy/don't-buy filters.

5.2 Threshold Sweep¶

We evaluate a series of thresholds at evenly-spaced percentiles of the metric distribution:

\[ \tau_k = \text{percentile}_k(M), \quad k \in \{5, 10, 15, \ldots, 95\} \]

For each threshold $\tau_k$ and a given direction (above or below):

\[ S_k = \begin{cases} \{i : m_i > \tau_k\} & \text{if direction = above} \\ \{i : m_i < \tau_k\} & \text{if direction = below} \end{cases} \]

5.3 Statistics at Each Threshold¶

Mean return of selected stocks:

\[ \bar{r}_{S_k} = \frac{1}{|S_k|} \sum_{i \in S_k} r_i \]

Lift over universe:

\[ \text{Lift}_k = \bar{r}_{S_k} - \bar{r}_{\text{universe}} \]

Mann-Whitney U test: A non-parametric test comparing the return distribution of $S_k$ against the full universe:

\[ H_0: F_{S_k}(r) = F_{\text{universe}}(r) \quad \text{vs} \quad H_1: F_{S_k}(r) \text{ is stochastically greater} \]

The Mann-Whitney U statistic is:

\[ U = \sum_{i \in S_k} \sum_{j \in \text{universe}} \mathbf{1}[r_i > r_j] \]

A one-sided test is used since we are interested in whether the filter selects outperformers.

5.4 Interpreting the Sweep¶

The output is a table showing how mean returns and significance vary as the threshold becomes more or less restrictive. This reveals:

Optimal threshold region: Where lift is maximized with acceptable $p$-values.
Diminishing returns: Whether making the filter stricter helps or hurts (too few stocks may increase noise).
Monotonicity of lift: Whether lift increases steadily or has a "sweet spot."

6. Filter Testing¶

In plain language: Filter testing lets you combine multiple conditions (e.g., "profitable AND low debt AND growing revenue") and see whether stocks passing all your filters actually outperform the market — with statistical rigor. It tells you not just whether your filtered group does better, but how much better and how confident you can be that the result isn't just luck.

6.1 Purpose¶

Filter testing evaluates an arbitrary boolean condition applied to the matrix (e.g., requiring ROIC above a threshold and positive free cash flow yield). It provides a comprehensive statistical comparison between the filtered subset and the full universe.

6.2 Statistical Tests¶

Given: - $\mathcal{F}$ = set of observations passing the filter, with returns $\{r_i\}_{i \in \mathcal{F}}$ - $\mathcal{U}$ = full universe, with returns $\{r_j\}_{j \in \mathcal{U}}$

Welch's $t$-test¶

Tests whether the filtered group has a significantly different mean return from the universe, without assuming equal variances:

\[ t = \frac{\bar{r}_\mathcal{F} - \bar{r}_\mathcal{U}}{\sqrt{\frac{s_\mathcal{F}^2}{n_\mathcal{F}} + \frac{s_\mathcal{U}^2}{n_\mathcal{U}}}} \]

Degrees of freedom via the Welch-Satterthwaite approximation:

\[ \nu = \frac{\left(\frac{s_\mathcal{F}^2}{n_\mathcal{F}} + \frac{s_\mathcal{U}^2}{n_\mathcal{U}}\right)^2}{\frac{s_\mathcal{F}^4}{n_\mathcal{F}^2(n_\mathcal{F}-1)} + \frac{s_\mathcal{U}^4}{n_\mathcal{U}^2(n_\mathcal{U}-1)}} \]

The two-tailed $p$-value is converted to one-sided (testing $\bar{r}_\mathcal{F} > \bar{r}_\mathcal{U}$):

\[ p_{\text{one-sided}} = \begin{cases} p_{\text{two-sided}} / 2 & \text{if } t > 0 \\ 1 - p_{\text{two-sided}} / 2 & \text{if } t \leq 0 \end{cases} \]

Mann-Whitney U Test¶

As in threshold analysis, this non-parametric test compares return distributions without assuming normality. Used as a robustness check alongside the $t$-test.

Cohen's $d$ (Effect Size)¶

Standardized measure of the practical significance of the return difference:

\[ d = \frac{\bar{r}_\mathcal{F} - \bar{r}_\mathcal{U}}{s_{\text{pooled}}} \]

where:

\[ s_{\text{pooled}} = \sqrt{\frac{s_\mathcal{F}^2 + s_\mathcal{U}^2}{2}} \]

Cohen's $d$	Interpretation
$< 0.2$	Negligible effect
$0.2 - 0.5$	Small effect
$0.5 - 0.8$	Medium effect
$> 0.8$	Large effect

In factor research, even $d \approx 0.1 - 0.3$ can be economically meaningful when applied across many positions and periods.

Skewness¶

Fisher-Pearson coefficient of skewness for both groups:

\[ \gamma = \frac{\frac{1}{n} \sum_{i=1}^n (r_i - \bar{r})^3}{\left(\frac{1}{n} \sum_{i=1}^n (r_i - \bar{r})^2\right)^{3/2}} \]

Positive skew (right tail) is desirable for long portfolios -- it means the filter selects stocks with occasional large gains rather than frequent large losses.

7. Conditional IC Analysis¶

In plain language: Some metrics work great for large companies but not small ones, or in certain sectors but not others. Conditional IC analysis checks whether a metric's predictive power changes depending on a second variable. For example, "Is FCF Yield more predictive for tech stocks or energy stocks?" If the answer is very different, you might want to use that metric only in sectors where it works.

7.1 Motivation¶

A factor's predictive power may not be uniform across all market conditions, sectors, or size buckets. Conditional IC analysis measures how IC varies when conditioning on a second variable, revealing interactions and regime-dependent behavior.

7.2 Single-Metric Conditional IC¶

Given a primary metric $A$ and a conditioning variable $B$:

Median Split:

\[ \text{IC}_{\text{low}} = \rho_s(A, R \mid B \leq \text{median}(B)) \]

\[ \text{IC}_{\text{high}} = \rho_s(A, R \mid B > \text{median}(B)) \]

\[ \Delta\text{IC} = \text{IC}_{\text{high}} - \text{IC}_{\text{low}} \]

Categorical Split (e.g., sector):

\[ \text{IC}_c = \rho_s(A, R \mid B = c), \quad \forall \; c \in \text{categories}(B) \]

Quintile Split:

\[ \text{IC}_{Q_j} = \rho_s(A, R \mid B \in Q_j(B)), \quad j \in \{1,\ldots,5\} \]

7.3 Conditional IC Scan¶

Rather than testing one metric at a time, the scan computes the conditional IC split for every metric against a single conditioning variable, then ranks by the spread between best and worst subset.

Median Scan (numeric conditioning variable):

\[ \Delta\text{IC}_M = \text{IC}_{M | B > \text{median}} - \text{IC}_{M | B \leq \text{median}} \]

Ranked by $|\Delta\text{IC}_M|$. Use this when you want a quick read on whether each metric flips sign or strength across the median of the conditioning variable.

Quintile Scan (numeric conditioning variable):

\[ \text{Spread}_M = \max_{j \in 1..5} \text{IC}_{M | B \in Q_j(B)} - \min_{j \in 1..5} \text{IC}_{M | B \in Q_j(B)} \]

Ranked by $|\text{Spread}_M|$. Captures non-monotonic dependencies that a median split would average away (e.g. a metric that works in the middle of the distribution but breaks at the tails).

Categorical Scan (sector / industry / country / exchange / MarketCapBand):

\[ \text{Spread}_M = \max_{c \in \text{categories}(B)} \text{IC}_{M | B = c} - \min_{c \in \text{categories}(B)} \text{IC}_{M | B = c} \]

Each category needs at least 20 observations to contribute. Ranked by $|\text{Spread}_M|$ — large values flag metrics whose predictive power is concentrated in a subset of categories.

Metrics with large spreads are candidates for conditional rules (e.g., "use FCFYield only in small/micro-cap stocks", or "use BookToMarket only in cyclicals").

7.4 Market Cap Band as a Conditioning Variable¶

Market cap is a continuous variable but the size-effect literature treats it as a small set of size buckets. QuantAscent exposes a virtual MarketCapBand column derived from MarketCap:

Band	Range
Mega	> $200B
Large	$10B - $200B
Mid	$2B - $10B
Small	$300M - $2B
Micro	< $300M

MarketCapBand is selectable as a conditioning variable in both single-metric and scan modes — it is treated as categorical with the bands ordered from largest to smallest. Use it instead of raw MarketCap when you want explicit size-band buckets rather than a quantile split. (When MarketCapBand is selected, the raw MarketCap column is excluded from scan-mode metric tests to avoid near-collinear results.)

7.5 Conditional Score IC¶

This variant measures the composite strategy score's IC within subgroups:

\[ \text{IC}_g = \rho_s(\text{CompositeScore}, R \mid \text{group} = g) \]

Grouping dimensions include: - Sector/Industry -- Does the strategy work uniformly across sectors? - Country/Exchange -- Geographic robustness check. - Market Cap Band -- Size-dependent effectiveness, using the bands defined in §7.4. - Year -- Temporal stability of the signal.

Minimum subgroup size of 20 observations is required to compute a meaningful IC.

8. Composite Scoring Systems¶

In plain language: Once you've identified individual metrics that predict returns, the next step is combining them into a single score. QuantAscent supports two approaches: a simple pass/fail checklist (threshold-based — "does the stock meet this criterion? +1 point") and a continuous ranking system (percentile-based — "how does this stock rank on each metric compared to all others?"). The stocks with the highest composite scores become your portfolio.

8.1 Threshold-Based Scoring (Binary Point System)¶

Each stock is evaluated against a set of binary criteria. If the criterion is met, 1 point is awarded; otherwise, 0.

\[ \text{Score} = \sum_{k=1}^{K} \mathbf{1}[\text{criterion}_k \text{ is met}] \]

Criteria are organized into thematic tiers (e.g., Value, Quality, Growth, Safety), each contributing a fixed number of points. The total possible score equals $K$.

Illustrative example -- a hypothetical 6-point scorer:

Tier	Criterion	Condition	Rationale
Value	Yield	$\text{FCFYield} > 0.03$	Selects stocks trading at a discount to cash generation
Value	Earnings	$\text{EarningsYield} > 0.04$	Avoids overvalued companies
Quality	Profitability	$\text{CurrentROIC} > 0.05$	Ensures the business earns above its cost of capital
Quality	Cash Backing	$\text{OperatingCF} > \text{NetIncome}$	Confirms earnings are backed by real cash flow
Growth	Trend	$\text{Revenue\_GrowthRate} > 0$	Company is on an improving revenue trajectory
Safety	Solvency	$\text{MeanZscore} > 1.8$	Filters out financially distressed firms

The specific metrics, thresholds, and tier weights are user-configurable via the strategy configuration. Researchers typically calibrate thresholds using the IC ranking and threshold sweep tools described in Sections 3 and 5 -- choosing cutoffs where the metric shows statistically significant predictive power.

8.2 Percentile-Based Scoring (Continuous Composite)¶

An alternative scoring approach ranks each metric across the universe and combines weighted percentile ranks into a continuous composite:

\[ \text{CompositeScore} = \sum_{k=1}^{K} \frac{w_k}{\sum w} \cdot \tilde{p}_k \]

where:

\[ \tilde{p}_k = \text{clip}\left(\frac{\text{pctrank}_k(m_k) - \text{min\_pct}_k}{\text{max\_pct}_k - \text{min\_pct}_k} \times 100, \; 0, \; 100\right) \]

$\text{pctrank}_k$ is the cross-sectional percentile rank of metric $k$ (0 to 100).
Direction-aware: for "ascending" metrics, higher raw values get higher percentile ranks; for "descending," lower values rank higher.
Optional clipping to a percentile range focuses the scoring on a relevant region of the distribution.
$w_k$ is the user-defined weight for metric $k$.

The composite score ranges from 0 to 100.

8.3 Candidate Selection¶

After scoring, candidate selection proceeds in two modes:

Threshold mode: Try decreasing score thresholds until enough stocks are selected:

\[ \text{candidates} = \{i : \text{Score}_i \geq \tau\}, \quad \tau \in \{K, K-1, K-2, \ldots\} \]

Stop at the first $\tau$ where the candidate count meets the minimum, then take the top candidates sorted by score (with a configurable tiebreaker metric for equal scores).

Percentile mode: Simply sort by composite score descending and take the top candidates.

Both modes apply hard filters first (market cap range, sector exclusions, etc.) before scoring.

9. Backtest Performance Metrics¶

9.1 Portfolio Simulation¶

The backtester runs a period-by-period simulation:

On each rebalance date, compute metrics for all tickers.
Apply filters and scoring to select a portfolio.
Hold an equal-weight portfolio for $\Delta$ days.
Compute period return as the mean of individual stock returns.
Compound the portfolio value: $V_{t+1} = V_t \cdot (1 + \bar{r}_t)$.
Track a SPY benchmark over the same periods.

9.2 Return Metrics¶

Compound Annual Growth Rate (CAGR):

\[ \text{CAGR} = \left(\frac{V_{\text{final}}}{V_{\text{initial}}}\right)^{1/T} - 1 \]

where $T$ = number of years. A CAGR above the S&P 500's long-run average (~10%) means your strategy is beating the market on an annualized basis.

Win Rate:

\[ \text{WinRate} = \frac{|\{t : r_t > 0\}|}{|\text{periods}|} \]

9.3 Risk Metrics¶

Annualized Volatility:

\[ \sigma_{\text{ann}} = \sigma_{\text{period}} \cdot \sqrt{\frac{365}{\Delta}} \]

where $\sigma_{\text{period}}$ is the sample standard deviation of period returns.

Maximum Drawdown:

\[ \text{MDD} = \max_{t} \left(\frac{\text{Peak}_t - V_t}{\text{Peak}_t}\right) \]

where $\text{Peak}_t = \max_{s \leq t} V_s$ is the running maximum portfolio value.

9.4 Risk-Adjusted Metrics¶

Sharpe Ratio:

\[ \text{Sharpe} = \frac{\bar{r}_{\text{excess}}}{\sigma_{\text{excess}}} \cdot \sqrt{\frac{365}{\Delta}} \]

where $r_{\text{excess},t} = r_t - r_f / n_{\text{periods/year}}$ and $r_f$ is the annualized risk-free rate. A Sharpe above 1.0 is generally considered good; above 2.0 is excellent. Below 0.5 means you're not being compensated well for the risk you're taking.

Sortino Ratio:

\[ \text{Sortino} = \frac{\bar{r} - r_f / n}{\sigma_{\text{down}}} \cdot \sqrt{\frac{365}{\Delta}} \]

where the downside deviation uses only negative excess returns:

\[ \sigma_{\text{down}} = \sqrt{\frac{1}{n} \sum_{t=1}^n \left[\min(0, \; r_t - r_f/n)\right]^2} \]

The Sortino ratio is preferred when the return distribution is asymmetric (positive skew), as it does not penalize upside volatility. Compare Sortino to Sharpe: if Sortino is much higher than Sharpe, your strategy has positive skew (big winners, small losers) — a desirable trait.

Calmar Ratio:

\[ \text{Calmar} = \frac{\text{CAGR}}{\text{MDD}} \]

Measures return per unit of maximum drawdown -- important for assessing whether the strategy's drawdown profile is tolerable. A Calmar above 1.0 means the strategy's annualized return exceeds its worst peak-to-trough drop — you're earning more than you're risking in the worst case.

9.5 CAPM Metrics¶

Beta (systematic risk):

\[ \beta = \frac{\text{Cov}(r_{\text{algo}} - r_f, \; r_{\text{SPY}} - r_f)}{\text{Var}(r_{\text{SPY}} - r_f)} \]

Alpha (annualized excess return beyond CAPM):

\[ \alpha = \left[\bar{r}_{\text{algo,excess}} - \beta \cdot \bar{r}_{\text{SPY,excess}}\right] \cdot \frac{365}{\Delta} \]

A positive alpha indicates the strategy generates returns not explained by market exposure alone. Alpha is the holy grail of investing — it means your strategy adds value beyond simply riding the market up and down. A beta near 1.0 means your strategy moves roughly in line with the market; below 1.0 means less volatile, above 1.0 means more.

9.6 Other Metrics¶

Profit Factor:

\[ \text{PF} = \frac{\sum_{t: r_t > 0} r_t}{\left|\sum_{t: r_t < 0} r_t\right|} \]

A profit factor $> 1$ means total gains exceed total losses. Values above 2.0 are strong. Think of it as "for every dollar I lose, how many dollars do I make?" A profit factor of 1.5 means $1.50 gained for every $1.00 lost.

10. Metric Computation Reference¶

10.1 Altman Z-Score¶

A composite measure of financial health:

\[ Z = 1.2 \cdot \frac{\text{WC}}{\text{TA}} + 1.4 \cdot \frac{\text{RE}}{\text{TA}} + 3.3 \cdot \frac{\text{EBIT}}{\text{TA}} + 0.6 \cdot \frac{\text{MV}}{\text{TL}} + 1.0 \cdot \frac{\text{Rev}}{\text{TA}} \]

Symbol	Meaning
WC	Working Capital (Current Assets - Current Liabilities)
TA	Total Assets
RE	Retained Earnings
EBIT	Earnings Before Interest & Taxes
MV	Market Value of Equity
TL	Total Liabilities
Rev	Revenue

Interpretation: $Z > 2.99$ = safe zone, $1.81 < Z < 2.99$ = grey zone, $Z < 1.81$ = distress zone.

10.2 Metric Categories¶

The 120+ computed metrics fall into these categories:

Category	Examples	Count
Valuation	P/E, P/B, P/S, EV/EBITDA, FCFYield, EarningsYield, GrahamNumber	15
Profitability	Gross, operating, net, EBIT, cash flow margins, IncomeQuality	6
Returns	ROE, ROIC, ROCE, ReturnOnTangibleAssets (current, previous, mean)	8
Growth Rates	Revenue, EPS, FCF, OperatingIncome, margins, ROIC, Z-Score growth	16
Growth R-squared	Trend consistency ($R^2$) for revenue, EPS, FCF, margins, and more	15
Cash Flow	OperatingCF, CapEx, SBC, working capital changes, buybacks, dividends	8
Yield & Shareholder	DividendYield, PayoutRatio, BuybackYield, TotalShareholderYield	4
Safety & Leverage	Altman Z-Score, Debt/Equity, InterestCoverage, CurrentRatio, QuickRatio	9
Efficiency	DSO, DIO, CashConversionCycle, CapexToRevenue, SBCtoRevenue, R&D/Rev	10
Size & Capital	MarketCap, EnterpriseValue, TotalDebt, NetDebt, Equity, Intangibles	13
Per Share & Income	EPS, BookValue/Share, Revenue/Share, Revenue, NetIncome, GrossProfit	10
Technical	CurrentPrice	1

10.3 Data Quality Requirements¶

Before computing metrics for a given (ticker, date), the following minimums must be met: - Sufficient price history available - At least 10 income statement records - At least 10 balance sheet records - Positive shares outstanding - Valid (positive) current price

Observations failing these checks are excluded from the matrix entirely.

Appendix: Why These Specific Tests?¶

The choice of statistical tools is deliberate:

Analysis	What It Reveals	Limitation Addressed
IC (Spearman)	Overall rank-monotonic predictiveness	First-pass screening of 120+ metrics
Quintile	Shape of the return-metric relationship	IC may miss non-linearities
Threshold Sweep	Optimal cutoff for binary filters	Quintile doesn't map directly to buy/sell rules
Filter Test	Combined multi-filter performance	Individual metric tests ignore interactions
Conditional IC	Regime/sector-dependent signal strength	Unconditional IC may average away conditional effects
Welch's $t$-test	Mean difference significance (parametric)	Assumes approximate normality
Mann-Whitney $U$	Distributional difference (non-parametric)	Robust check on $t$-test
Cohen's $d$	Practical (economic) significance	Statistical significance $\neq$ practical importance

By applying multiple complementary analyses, we build converging evidence about factor quality and reduce the risk of overfitting to a single statistical artifact.

Glossary¶

Term	Definition
IC (Information Coefficient)	A measure of how well a metric's rankings predict future stock returns. Higher absolute values mean stronger predictive power.
Spearman correlation	A rank-based correlation that measures whether two variables move together in order (not necessarily in a straight line). More robust than standard (Pearson) correlation for financial data.
Quintile	One of five equal-sized groups when all stocks are sorted by a metric. Q1 is the bottom 20%, Q5 is the top 20%.
CAGR (Compound Annual Growth Rate)	The smoothed annual return that gets you from the starting value to the ending value. Accounts for compounding.
Sharpe Ratio	Return per unit of total risk (volatility). Higher is better. Above 1.0 is good; above 2.0 is excellent.
Sortino Ratio	Like Sharpe, but only penalizes downside volatility. Rewards strategies with big gains and small losses.
Alpha	The portion of a strategy's return that can't be explained by market movements. Positive alpha = the strategy adds value beyond just tracking the market.
Beta	How much a strategy moves relative to the market. Beta of 1.0 = moves with the market. Below 1.0 = less volatile. Above 1.0 = more volatile.
Max Drawdown	The largest peak-to-trough decline in portfolio value. Measures worst-case pain.
p-value	The probability that a result as extreme as the observed one could occur by chance. Below 0.05 is conventionally considered statistically significant.
Effect size (Cohen's d)	A standardized measure of how large a difference is, independent of sample size. Even small effect sizes (0.1–0.3) can be economically meaningful in investing.
Forward return (FutureProfit)	The actual stock return over the next rebalance period (default 90 days). This is what every analysis is trying to predict.

Cohen's \(d\)	Interpretation
\(< 0.2\)	Negligible effect
\(0.2 - 0.5\)	Small effect
\(0.5 - 0.8\)	Medium effect
\(> 0.8\)	Large effect

Tier	Criterion	Condition	Rationale
Value	Yield	\(\text{FCFYield} > 0.03\)	Selects stocks trading at a discount to cash generation
Value	Earnings	\(\text{EarningsYield} > 0.04\)	Avoids overvalued companies
Quality	Profitability	\(\text{CurrentROIC} > 0.05\)	Ensures the business earns above its cost of capital
Quality	Cash Backing	\(\text{OperatingCF} > \text{NetIncome}\)	Confirms earnings are backed by real cash flow
Growth	Trend	\(\text{Revenue\_GrowthRate} > 0\)	Company is on an improving revenue trajectory
Safety	Solvency	\(\text{MeanZscore} > 1.8\)	Filters out financially distressed firms