Loading model details...
Back to Leaderboard
Microsoft
Phi 4
Rank
#58
Overview
Analytics
Score Overview
20.2%
Overall Score
16.7%
Accuracy
56.7%
Syntax Valid
Score Dimensions
26
Correctness
93
Best Practices
20
Performance
20
Clarity
Performance by Complexity
Basic
84.5%
Intermediate
9.6%
Advanced
8.3%
Tasks Correct
5 / 30
Avg Response Time
3742ms
Task Breakdown
Summary
Dimensions
Task
Category
Score
Time
Total Sales Amount
task-001
aggregation
100%
845ms
Count of Customers
task-002
aggregation
100%
2573ms
Average Unit Price
task-003
aggregation
100%
2148ms
Distinct Product Count
task-004
aggregation
100%
2349ms
Total Order Quantity
task-005
aggregation
100%
721ms
Year-to-Date Sales
Query Failed
task-006
time intelligence
10%
2796ms
Previous Year Sales
Query Failed
task-007
time intelligence
10%
2357ms
Sales by Category Filter
Query Failed
task-008
filtering
10%
1263ms
Year-over-Year Growth Percentage
Query Failed
task-009
calculation
10%
4792ms
Running Total with CALCULATE and FILTER
Query Failed
task-010
iterator
10%
3035ms
Sales Summary by Category
Query Failed
task-011
table manipulation
10%
3210ms
Product List with Renamed Columns
Query Failed
task-012
table manipulation
10%
3203ms
Union of High-Value Transactions
Query Failed
task-013
table manipulation
10%
5220ms
Year-Category Analysis Matrix
Query Failed
task-014
table manipulation
10%
4015ms
Product Percentage of Category Total
Query Failed
task-015
context transition
10%
4447ms
Virtual Relationship with TREATAS
Query Failed
task-016
context transition
10%
3544ms
Granularity-Aware Measure with VALUES
Query Failed
task-017
context transition
10%
4924ms
Running Count with EARLIER
Query Failed
task-018
context transition
10%
3116ms
Multiple Filter Conditions
Query Failed
task-019
filtering
10%
2362ms
Percentage of Total with ALLEXCEPT
Query Failed
task-020
filtering
10%
4276ms
Filter Intersection with KEEPFILTERS
Query Failed
task-021
filtering
10%
4299ms
Product Ranking with RANKX
Query Failed
task-022
iterator
10%
3226ms
Top 5 Products with TOPN
Query Failed
task-023
table manipulation
10%
5106ms
90th Percentile Order Value
Query Failed
task-024
iterator
10%
1074ms
Handle Missing Data with BLANK
Query Failed
task-025
calculation
10%
2398ms
Safe Ratio with Cascading Fallbacks
task-026
calculation
30%
15737ms
Safe Year-over-Year with Missing Data
Query Failed
task-027
time intelligence
10%
4467ms
3-Month Rolling Average
Query Failed
task-028
time intelligence
10%
6985ms
Same Month Previous Year Comparison
Query Failed
task-029
time intelligence
10%
3665ms
Fiscal Year-to-Date (July Start)
Query Failed
task-030
time intelligence
10%
4119ms