How a Comp-Matching Engine Works (and Why Spreadsheets Can't Keep Up)

Data visualization showing property comparison matching on screen

Maya Okonkwo

Head of Data & Integrations, Cremdeal

The phrase "comp-matching engine" gets used loosely in the CRE technology market — sometimes to describe something sophisticated, sometimes to describe a filtered CoStar export with a fancier interface. It's worth being specific about what the underlying logic of a genuine comp-matching system involves, and where the process breaks down when the same work is done in a spreadsheet.

A comp-matching engine, at its core, is a system that takes a subject lease — a specific requirement with defined parameters — and returns a ranked set of comparable executed transactions from a historical database, filtered and scored against the subject's attributes. That definition sounds straightforward. Implementing it correctly against CRE transaction data requires addressing a series of problems that manual spreadsheet workflows handle inconsistently.

The input: defining the subject lease

A comp match starts with a subject lease specification. The minimum useful specification includes: submarket or micro-market, building class (Class A, B, or C), SF range (typically applied as a band around the target size — e.g., 70–130% of the target), lease structure (direct versus sublease, NNN versus gross versus modified gross), and deal date window (how many months back the comp pool extends).

More precise specifications add: load factor range or a specific common area factor target, floor plate preference, building vintage (relevant for Class B versus older Class A), tenant improvement condition at time of lease (as-built versus spec-suite versus landlord's work), and lease term range (short-term versus long-term lease economics differ).

In a spreadsheet workflow, this specification is applied as a series of filter steps. In a matching engine, it's encoded as a query with parameter weights. The practical difference is repeatability: a spreadsheet filter applied by two different analysts on two different days may not produce the same comp pool because the filter logic isn't recorded. A matching engine applies the same logic every time for the same parameters.

The filtering step: what gets excluded and why

This is where most manual comp processes introduce the most variability. A raw CoStar pull for a Dallas office submarket with basic size and date filters returns a pool of transactions that includes renewals, subleases, short-term licenses, and deals in buildings that aren't genuinely comparable to the subject property. Each exclusion category requires a judgment call.

Renewal comps require careful handling. A lease renewal in a well-maintained Class A building, where the tenant negotiated from a position of convenience rather than market optionality, may reflect a below-market rent driven by retention economics rather than market clearing. Including renewals in a comp set without noting their nature will pull the effective rent benchmark down in ways that misrepresent where new deal economics actually are. Renewal comps are not useless — they're market data — but they need to be labeled and used with appropriate context.

Sublease comps require similar treatment. Sublease space typically prices at a discount to direct space because the sublessor is motivated by cost recovery rather than rent optimization, lease term is constrained by the head lease expiration, and the sublessee takes assignment risk. A comp set that blends direct and sublease transactions without distinguishing them will misrepresent direct-deal economics, particularly in a market where sublease availability is elevated — which has been the case in segments of the Dallas CBD and Uptown since 2022.

A matching engine can encode these exclusion rules as defaults, with analyst override capability for cases where including the excluded category is analytically appropriate. A spreadsheet process relies on the analyst remembering to apply the exclusions every time.

The scoring step: ranking within the filtered pool

Once the raw comp pool has been filtered, a matching engine ranks the remaining transactions by comparability to the subject. The ranking logic typically weights attribute proximity: a comp with SF within 10% of the subject, in the same micro-market, with the same lease structure, closed within 12 months, ranks higher than a comp that matches on building class but is 40% larger and two years old.

The specific weighting logic matters and is where matching engines differ from each other. The variables that tend to carry the most weight in office lease comp matching: recency (recent deals reflect current market conditions), submarket proximity (within a defined micro-market versus broader submarket boundary), size proximity, and lease structure (NNN versus gross structures should not be mixed without rent conversion). Building quality and amenity level are harder to systematize — a score based purely on CoStar building class designation will sometimes group buildings that experienced brokers would treat as uncomparable.

A well-designed matching engine is transparent about its scoring logic. The analyst should be able to see why a specific comp ranked where it did and override the ranking when broker judgment suggests a different weighting for the specific subject lease. The goal isn't to replace broker judgment — it's to make the baseline comp set faster and more repeatable so broker judgment can be applied at the right step.

Where spreadsheets actually break down

We're not arguing that spreadsheet-based comp work is always wrong. Experienced analysts using well-structured spreadsheets with consistent methodology can produce accurate comp sets. The failure modes are specific to workflow conditions that are common in regional brokerage operations.

The first is deadline pressure. When a tenant rep has 48 hours to support an LOI, the steps most likely to get abbreviated are the careful exclusion of renewals and subleases, the documentation of why specific comps were included or excluded, and the cross-referencing of CoStar data against CompStak concession figures. A matching engine that automates those steps doesn't eliminate deadline pressure, but it compresses the time required to do the baseline work correctly.

The second is comp staleness. A comp set built in a spreadsheet six months ago for a similar requirement doesn't automatically update when new transactions close in the submarket. A matching engine pulling from a live data connection reflects current transaction history without requiring the analyst to rebuild the comp set from scratch each time.

The third is audit trail. In a dispute or negotiation, being able to document the methodology behind a comp set — which filters were applied, which exclusions were made and why, which data source was used — adds analytical credibility. A spreadsheet's methodology is often implicit rather than documented. A matching engine creates a repeatable, documentable process.

What this looks like in practice

A regional brokerage in the DFW market working a 15,000 SF Class A office requirement in Las Colinas runs the following practical workflow through a comp-matching engine: specify the subject parameters, pull the filtered pool from CoStar via API, apply standard exclusions (renewals flagged rather than excluded, subleases excluded from primary set), cross-reference CompStak for concession data on the top 10 matches, rank by recency and submarket proximity, and output a formatted comp table with deal attributes. That process, done manually in a spreadsheet, takes 2–3 hours. Through a structured matching workflow, the baseline comp set is ready in 15–20 minutes, with the analyst's time concentrated on the judgment steps that actually require their expertise.

That's what a comp-matching engine actually does. Not magic — structured workflow applied consistently, at the speed that regional brokerage deal timelines require.

To see what the comp-matching engine produces against your own submarket and active requirements, request a demo. We'll run a live comp set against one of your current deals so you can assess the output directly.

Maya Okonkwo

Head of Data & Integrations, Cremdeal