Purpose: Apply to every behavioral indicator before including it in the Behavioral Indicator Matrix. All four questions must pass. A single failure requires revision — not rationalization. Use the Session Log at the bottom to document calibration decisions.
Apply to every behavioral indicator before including it in the Behavioral Indicator Matrix (Tool 3). All four questions must pass — a single failure requires revision. This card is designed for use during calibration sessions and ongoing model review.
Q1Observability
Can two evaluators independently observe this behavior in the same situation and agree on whether it occurred?
Fail signal: Uses internal states — understands, appreciates, knows, values, is aware of — or any phrase describing what the practitioner thinks or feels rather than what they do.
Fix: Replace the internal state with the external behavior that demonstrates it.
"Understands customer priorities" → "Restates the customer's stated business objectives before proposing any recommendation, without prompting"
"Understands customer priorities" → "Restates the customer's stated business objectives before proposing any recommendation, without prompting"
Q2Behavioral Specificity
Does this describe what the person does, not what they are?
Fail signal: Trait labels — is empathetic, thinks strategically, is proactive, is a team player, is customer-centric, is a good communicator.
Fix: Replace trait labels with the behaviors that constitute the trait.
"Is proactive" → "Identifies potential account risks before they appear in health score data and initiates outreach without prompt from manager"
"Is proactive" → "Identifies potential account risks before they appear in health score data and initiates outreach without prompt from manager"
Q3Level Discrimination
Is this indicator qualitatively distinct from the adjacent level — not just "more" of the same?
Fail signal: Adjacent levels can only be distinguished by frequency (sometimes vs. usually) or quality adverbs (adequately vs. effectively). If you can swap "sometimes" for "usually" and get the next level, the calibration is wrong.
Fix: Identify the qualitative capability shift. Ask: "What does a Level 3 practitioner do that is literally impossible for a Level 2 practitioner — not just less likely?" If you cannot answer this, the level distinction has not been found yet.
Q4Developability
Can this behavior be built through deliberate practice and experience? Is it a learnable pattern, not a fixed trait?
Fail signal: Personality characteristics practitioners either have or don't — is charismatic, is naturally curious, has high emotional intelligence, is a natural leader.
Fix: Identify the behavioral equivalent that can be developed through practice.
"Is naturally curious" → "Asks at least two exploratory questions beyond the customer's stated agenda in each discovery meeting, documented in call notes"
"Is naturally curious" → "Asks at least two exploratory questions beyond the customer's stated agenda in each discovery meeting, documented in call notes"
Session Use — Indicator Review Log Use during calibration sessions to document indicator decisions
| Indicator (first 10 words) | Q1 Observable? |
Q2 Behavioral? |
Q3 Level Distinct? |
Q4 Developable? |
Decision | Revision Note |
|---|