Abstract

Every major asset class, institutional strategy, and professional investment mandate has a benchmark. Retail investor portfolios, the financial vehicles held by approximately 165 million Americans representing trillions in assets under management, do not. Not in any meaningful sense.

The S&P 500 is not a retail investor benchmark. It is an institutional equity index that has become a retail default through familiarity, not fitness. A balanced 60/40 portfolio compared to the S&P 500 during a bull equity market will consistently appear to underperform. The same portfolio during a bear market will appear to outperform. Neither comparison reflects what the investor actually needs to know: how does my portfolio perform relative to real investors managing money under the same constraints, in the same risk category, at the same time?

That question has never had a reliable answer. This paper examines why, and proposes an architecture that can provide one.

Part One: The Benchmark Gap Is Structural, Not Accidental

The absence of a meaningful retail investor benchmark is not a data problem. The data exists. Every custodian holds complete daily account-level data for every portfolio they service. Wealth management firms have always had the information required to produce verified peer comparisons within their own client base.

The gap is structural. It has three documented causes.

First: technical fragmentation. Retail portfolios are distributed across thousands of custodians: brokerages, banks, 401(k) administrators, IRAs, and taxable accounts. Each produces data in proprietary formats with different identifiers, different settlement conventions, and different data delivery schedules. No single custodian has cross-custodial visibility. Building a benchmark from real retail portfolios requires solving a multi-custodial aggregation problem at consumer scale that simply did not have a reliable technical solution until recently.

Second: structural conflict of interest. The SEC has stated in published guidance that broker-dealers and investment advisers have documented economic incentives to recommend products that generate more revenue for the firm, even when better alternatives exist for the client. A verified cross-custodial performance benchmark creates objective accountability for advisor and firm performance. Firms that manage money have no economic incentive to build a tool that independently measures how well they manage it. The incentive to maintain the benchmark gap is real and persistent.

Third: benchmark industry economics. The organizations that maintain the indexes retail investors are currently given, including S&P Global, MSCI, FTSE Russell, and Bloomberg, generate revenue by licensing those indexes to the funds that track them. Approximately $15 trillion in passive assets globally track indexes controlled by a small number of private organizations. A retail peer benchmark built from actual investor portfolios would reduce dependency on licensed institutional indexes for retail performance evaluation. The incumbent economic model is served by the current state.

None of these structural causes require malicious intent to persist. They are the predictable outcome of a system in which every party capable of building a retail peer benchmark has a financial interest in not building one.

Part Two: Why the S&P 500 Fails as a Retail Benchmark, and Why Blended Alternatives Don't Fix It

The S&P 500 as a retail performance benchmark fails on its own terms, independent of any competing alternative.

It is a price-return index of 500 large-cap US equities selected by a private committee. Its construction is explicitly institutional. It carries no fees, no taxes, no cash drag, no advisor cost, and no behavioral friction. A retail investor comparing their actual portfolio, which carries all of those costs, to the S&P 500 is not receiving performance attribution. They are receiving a comparison between a real-world outcome and a frictionless theoretical construct. The comparison systematically misrepresents the investor's real-world performance relative to what the index number suggests.

The commonly proposed alternative, a custom blended benchmark combining S&P 500 exposure with fixed income indexes weighted to match the investor's allocation, is more sophisticated but inherits the same fundamental problem. It is still a frictionless construct. An 80/20 blended benchmark assumes perfect rebalancing at zero cost, no cash drag, no behavioral timing errors, and no tax events. It measures what a theoretical portfolio with that allocation would have returned, not what real investors with that allocation actually experienced.

The difference between a blended index return and what real 80/20 investors actually earned is not noise. It is the accumulated cost of real-world friction: advisor fees, timing decisions, tax events, behavioral errors. The blended index is specifically constructed to exclude all of it.

No version of a constructed index benchmark can close this gap. The friction is real. It belongs in the benchmark.

Part Three: What a Meaningful Retail Benchmark Actually Requires

A benchmark that meaningfully answers the question "how am I doing relative to real investors like me" requires four things that no constructed index provides.

One: Real portfolio data. Not modeled portfolios. Not theoretical allocations. Actual holdings, actual weights, actual positions, from real investor accounts at real custodians.

Two: Cash-flow neutralization at the account level. Raw asset value changes conflate investment performance with cash flows: contributions, withdrawals, dividends. A meaningful return calculation must isolate performance from cash movement at the individual account level before aggregating across accounts. Modified Dietz and similar time-weighted methodologies accomplish this. The key requirement is that neutralization happens before aggregation, not after.

Three: Allocation-based categorization. Portfolios must be grouped by actual allocation, not by stated objective, not by fund label, but by the realized stock/bond composition of the actual holdings. This is the only basis for a genuinely apples-to-apples peer comparison. An 80/20 portfolio belongs in an 80/20 category regardless of what its advisor called it or what model portfolio it was supposed to track.

Four: Dynamic category assignment. Portfolio allocations drift. An account that was 80/20 six months ago may be 65/35 today due to market movement or rebalancing decisions. A meaningful benchmark requires that categorization reflect actual current composition, updated continuously, so that every period's benchmark calculation uses the period's actual allocation data rather than a stale classification.

These four requirements are achievable with current technology. Consumer-permissioned data access via API-connected custodian feeds, Modified Dietz return calculation at the account level, and daily reallocation based on end-of-day holdings data satisfy all four.

Part Four: A Proposed Category Architecture and Its Statistical Defensibility

One viable architecture organizes connected portfolios into nine allocation categories across three parent groups.

Growth: 100% equity, 80/20 equity/fixed income, 70/30
Balanced: 60/40, 50/50, 40/60
Income: 30/70, 20/80, 10/90

Every portfolio is assigned to a category based on its actual realized allocation, recalculated daily from end-of-day custodian data. Category assignment changes automatically when allocation drifts across a category boundary. This is not a modeling assumption. It is a direct read of what the portfolio actually holds at the close of each trading day.

The benchmark for each category is the arithmetic mean of the cash-flow-neutralized daily returns of all portfolios in that category, calculated at the portfolio level before aggregation. This produces a number that represents what real investors in that category actually earned over any given period, including all real-world friction.

On statistical defensibility at the category level: using standard confidence interval methodology at 95% confidence with a 5% margin of error, approximately 384 portfolios per category produces a statistically reliable benchmark. At 1,000 portfolios per category the confidence interval tightens substantially. This is a meaningful but achievable threshold, not the millions-of-accounts scale often assumed necessary for financial benchmarks. The benchmark is not a survey estimate. It is a population measurement of connected accounts, which means the statistical requirements are different from, and more favorable than, survey-based data collection.

One important caveat worth stating directly: the portfolios that connect to any platform built on this architecture are those whose holders chose to connect them. This is not a random sample of all retail investors. It is a self-selected population of investors who sought out a peer benchmarking tool. As any such platform scales, this selection dynamic diminishes in practical significance, but it is the honest characterization of the dataset at any point in time. The benchmark measures what connected investors in each category actually earned. It does not claim to measure what all retail investors in the US with that allocation earned, and any such claim would be an overclaim not supported by the methodology.

What it does claim, and what is fully supported, is that it provides the most empirically grounded available comparison for an investor asking "how am I doing relative to real investors managing money like mine?" No constructed index can provide that. No blended benchmark can provide that. Only real portfolio data from real accounts can.

Part Five: What This Changes for Wealth Management

The implications of a functioning retail peer benchmark extend beyond individual investor utility. For the wealth management industry, a verified cross-custodial peer benchmark introduces an accountability layer that has not previously existed at the retail level.

Currently, the fiduciary standard exists in regulatory language without an empirical measurement tool to evaluate compliance in practice. An advisor can satisfy the fiduciary standard on paper while delivering consistently below-average returns relative to peers managing money under identical constraints. There is no independent mechanism to detect this at the retail level because there has been no independent benchmark against which to measure it.

A functioning retail peer benchmark does not adjudicate fiduciary compliance. It provides the data that makes meaningful evaluation possible for the first time. The question "is my advisor doing a good job" has always had two components. The subjective component, covering communication style, responsiveness, and planning quality, has always been evaluable over time. The objective component, covering investment performance relative to peers, has not, because the benchmark required to evaluate it did not exist.

For fintech builders and financial infrastructure providers, the emergence of a cross-custodial retail performance dataset creates several downstream possibilities: performance-based advisor matching, benchmark-integrated financial planning tools, AI-powered portfolio analysis calibrated against real peer outcomes rather than theoretical models, and regulatory reporting infrastructure that reflects real-world investor experience rather than constructed proxies.

Part Six: On the Role of Open Banking Infrastructure

The technical precondition for this architecture is consumer-permissioned data access at custodian scale. The CFPB's Section 1033 Personal Financial Data Rights rule, finalized October 2024 and currently under reconsideration, is the regulatory expression of this precondition. Its current legal status is uncertain. Its direction is not.

The API infrastructure required to aggregate consumer-permissioned financial data at scale exists independently of the regulatory outcome. Plaid, MX, Finicity, and similar aggregators have normalized account connectivity to the point where linking a financial account via API is a routine consumer action across hundreds of millions of accounts. The regulatory fight over Section 1033 is about who controls the economics of that data access, not about whether the access itself is technically possible.

The architecture described in this paper is functional under the current regulatory state and becomes more robust as open banking standards mature, regardless of the specific outcome of Section 1033 litigation.

Conclusion: The Argument the Methodology Makes

The case for a retail peer benchmark does not rest on the inadequacy of existing benchmarks as products. It rests on the fundamental observation that a constructed index and a real-world peer comparison are different kinds of things, not better and worse versions of the same thing.

The S&P 500 tells an investor how a frictionless theoretical equity portfolio performed. A retail peer benchmark tells an investor how real people managing money under their same constraints actually did. These are different questions. They have different answers. Both are legitimate. Only one has ever been available to retail investors.

The architecture required to produce the second answer is now buildable. The statistical methodology is sound. The technical infrastructure exists. The regulatory environment is moving in a direction that supports it even if it does not yet mandate it.

What is being proposed here is not a replacement for institutional benchmarks or a critique of the index industry. It is the addition of a category of measurement that has always been missing from the retail investor's toolkit. Not because it was impossible. Because the incentive to build it has always resided with the investor rather than the industry.

That is the argument. The methodology carries it.

About the Author

Shawn Tierney spent ten years as a financial advisor across wirehouses, independent practice, and the bank channel from 1998 to 2008, working with retail investors at every income and education level. He is a two-time software founder whose first company was acquired.

This paper is for informational and discussion purposes only and does not constitute financial, legal, or investment advice.

The Missing Benchmark: Why Retail Investors Have Never Had a Meaningful Peer Comparison, and What the Architecture of One Actually Requires