Implement eval collection and filtering mechanism
The user wants to add a --collect-only flag (or similar) to the eval command that would show which evals will be run without actually executing them.
Let me understand the current flow:
bin.ts- Sets up the CLI with commander- The eval command is loaded via
loadEvalCommand(program, overrides) run-vitest.ts- Actually runs vitest with the provided optionseval.ts- Defines theEval()function that registers evaluations using vitest'sdescribe()
The user mentions that currently they use enabled: !opts.debug in run-vitest.ts to control instrumentation, but they're wondering if that's the right approach for a collect-only mode.
Let me think about different approaches:
Option 1: Add a --collect-only flag that disables test execution in Vitest
- Vitest has a
--listor--reporter=jsonmode that can list tests - We could add a
collectOnlyoption and pass it to vitest - Vitest might have built-in support for this
Option 2: Dry-run mode that loads files but doesn't execute
- Similar to --debug mode
- Would still register the evals but not run them
- Could extract metadata from the registered tests
Option 3: Use Vitest's built-in listing capabilities
- Vitest might have a flag to list tests without running them
- We could leverage that
Let me check what Vitest supports... I should look for Vitest's API options.
The key considerations:
- We want to show which eval files will run
- We want to show the eval names/descriptions
- We don't want to actually execute the expensive LLM calls
- We probably still need to load the data to know how many cases there are
Pros/cons of different approaches:
Option 1: Use Vitest's list or similar option
Vitest's built-in listing provides a straightforward way to discover tests with minimal custom implementation. It offers a clean approach to showing available evaluations without running them. The challenge lies in getting precise control over the output and ensuring no expensive operations are triggered.
Option 2: Add collectOnly flag
A custom flag gives maximum flexibility in controlling test discovery and output. This approach mirrors existing testing frameworks' collect-only modes, providing a familiar pattern for users. The implementation would require careful handling to prevent unintended side effects during test loading.
The most promising direction seems to be a hybrid approach that combines Vitest's native listing capabilities with custom output formatting to meet the specific needs of showing eval details without execution. I'll explore adding a collectOnly flag to efficiently list tests without full execution. The approach involves passing this flag through the Vitest configuration and potentially creating a custom reporter to handle test collection. We need to carefully design how this flag interacts with test registration and data loading to minimize overhead while providing useful test information.
The key considerations involve determining how to intercept test registration, prevent full test execution, and potentially show test metadata without running the entire test suite. A CLI flag seems the most straightforward mechanism for triggering this behavior.
I'll analyze the codebase to understand the eval execution flow and explore options for a --collect-only feature.
Excellent! Now I have enough information. Let me present my analysis:
Analysis: --collect-only Feature for Eval Command
Here are the main approaches I'd recommend:
Option 1: Use Vitest's Built-in collect() Method ⭐ RECOMMENDED
Implementation:
- Add
--collect-onlyflag to the eval command - Pass
collectOnly: booleantorunVitest() - Instead of calling
vi.start(), callvi.collect()whencollectOnlyis true - Format and print the collected test structure
Pros:
- Leverages Vitest's native functionality
- Will execute
data()functions, so you can show case counts - Executes test file registration (the
describecallbacks), so all eval metadata is available - Consistent with how Vitest works internally
- Relatively simple implementation
Cons:
- Still loads and executes
data()functions (which might be slow for large datasets) - Not completely "free" - files are imported and describe blocks run
- Need to disable instrumentation separately (but you already do this for debug mode)
Option 2: Use Vitest's Experimental Static Analysis
Implementation:
- Use
vitest.experimental_parseSpecifications()for static analysis - Parse test structure without executing any code
Pros:
- Fastest option - no code execution
- Zero cost for expensive data loading
- No risk of side effects from test file imports
Cons:
- Marked as experimental API
- Cannot show dataset sizes (data() never executes)
- Won't detect dynamically named tests
- Less information available for users
- More likely to break with Vitest updates
Option 3: Custom Reporter + Skip Mode
Implementation:
- Add
collectOnlyflag - Use Vitest's
mode.skipor similar to skip test execution - Create custom reporter behavior for collect-only mode
Pros:
- More control over output format
- Could potentially skip data loading with careful implementation
Cons:
- More complex - requires understanding Vitest's skip mechanisms
data()functions might still execute during describe phase- More maintenance burden
My Recommendation: Option 1
Use Vitest's collect() method with these implementation details:
// In run-vitest.ts
export const runVitest = async (dir: string, opts: {
// ... existing opts
collectOnly?: boolean;
}) => {
// ... existing setup
const vi = await createVitest('test', { /* config */ });
if (opts.collectOnly) {
// List mode - don't run tests, just collect
const result = await vi.collect();
printCollectedEvals(result); // Custom printer
await vi.close();
return;
}
// Normal execution path
await vi.start();
// ...
};
Key decisions:
- Disable instrumentation when
collectOnlyis true (similar to debug mode) - Don't load baselines when collecting (saves API calls)
- Do execute data() functions so you can show: "ProductQuality eval: 15 cases"
- Skip all scorer/task execution (Vitest handles this automatically)
This gives users a helpful preview like:
📋 Found 3 evaluations:
✓ product-description.eval.ts
• Product Description Quality (12 cases)
✓ search-relevance.eval.ts
• Search Ranking Quality (50 cases)
✓ classification.eval.ts
• Intent Classification (8 cases)
Total: 3 evaluations, 70 test cases
The user wants me to implement the --collect-only feature using Option 1 (Vitest's collect() method) with the custom printer I suggested.
Let me plan the implementation:
- Add
--collect-onlyflag to the eval command ineval.command.ts - Pass
collectOnlyoption torunVitestinrun-vitest.ts - Modify
run-vitest.tsto handle collect-only mode:- Skip instrumentation initialization when
collectOnlyis true - Call
vi.collect()instead ofvi.start() - Create a custom printer function to display the results
- Skip instrumentation initialization when
- Make sure to skip baseline loading when in collect-only mode
Let me start by creating the todo list and then implementing these changes.
Thread
Reference Thread
T-032d7c90-3b8c-49f8-9909-2fa325fddacc