Loading model details...
Back to Leaderboard
Meta
Llama 3.3 70B Instruct
Rank
#53
Overview
Analytics
Score Overview
46.6%
Overall Score
40.0%
Accuracy
100.0%
Syntax Valid
Score Dimensions
52
Correctness
61
Best Practices
69
Performance
69
Clarity
Performance by Complexity
Basic
99.9%
Intermediate
35.6%
Advanced
39.0%
Tasks Correct
12 / 30
Avg Response Time
2613ms
Task Breakdown
Summary
Dimensions
Task
Category
Score
Time
Total Sales Amount
task-001
aggregation
100%
1015ms
Count of Customers
task-002
aggregation
100%
777ms
Average Unit Price
task-003
aggregation
100%
955ms
Distinct Product Count
task-004
aggregation
100%
4538ms
Total Order Quantity
task-005
aggregation
100%
976ms
Year-to-Date Sales
Query Failed
task-006
time intelligence
10%
3552ms
Previous Year Sales
Query Failed
task-007
time intelligence
10%
900ms
Sales by Category Filter
task-008
filtering
100%
813ms
Year-over-Year Growth Percentage
task-009
calculation
30%
3134ms
Running Total with CALCULATE and FILTER
task-010
iterator
30%
3286ms
Sales Summary by Category
task-011
table manipulation
30%
8389ms
Product List with Renamed Columns
task-012
table manipulation
100%
1823ms
Union of High-Value Transactions
Query Failed
task-013
table manipulation
10%
2025ms
Year-Category Analysis Matrix
Query Failed
task-014
table manipulation
10%
1720ms
Product Percentage of Category Total
task-015
context transition
30%
1700ms
Virtual Relationship with TREATAS
Query Failed
task-016
context transition
10%
1563ms
Granularity-Aware Measure with VALUES
task-017
context transition
30%
2131ms
Running Count with EARLIER
Query Failed
task-018
context transition
10%
2105ms
Multiple Filter Conditions
task-019
filtering
100%
1328ms
Percentage of Total with ALLEXCEPT
task-020
filtering
100%
583ms
Filter Intersection with KEEPFILTERS
task-021
filtering
100%
2237ms
Product Ranking with RANKX
Query Failed
task-022
iterator
10%
1682ms
Top 5 Products with TOPN
Query Failed
task-023
table manipulation
10%
2749ms
90th Percentile Order Value
task-024
iterator
100%
903ms
Handle Missing Data with BLANK
task-025
calculation
100%
1029ms
Safe Ratio with Cascading Fallbacks
Query Failed
task-026
calculation
10%
1507ms
Safe Year-over-Year with Missing Data
task-027
time intelligence
30%
7217ms
3-Month Rolling Average
task-028
time intelligence
30%
10661ms
Same Month Previous Year Comparison
task-029
time intelligence
30%
2684ms
Fiscal Year-to-Date (July Start)
task-030
time intelligence
30%
4416ms