A usage tool for the people using AI
| My Role: | Product Designer |
| Company: | Speculative project |
| Project: | Tally, a third-party observability product for Anthropic API usage |
| Methods: | Speculative design, product architecture, interaction design, UX writing |
| Deliverables: | Three-view dashboard system and in-app/email tip flows |
We see less of our AI usage than almost anything else we do at work. We get monthly reports on cloud spend, weekly metrics on shipped features, and dashboards with real-time metrics like traffic and revenue. But the tool we’re reaching for throughout the day is mostly invisible until the bill arrives.
Tally is a speculative design exploration of what closing that gap could look like.
Conducted in May 2026
What is Tally?
I started this project after a week of hitting my own Claude usage limits an hour into what I'd call light work. That's the problem at the scale of one person. Scale it to a team and it compounds: three designers on a six-person DesignOps team can hit their limits before lunch, and the team lead has no more insight into why than the designers do.
Existing tools in this space are built for the bill payer. The practitioner gets a fraction of what they'd need to work more efficiently.
Tally is a third-party observability product for Anthropic API usage. It sits on top of the data Anthropic already exposes and reshapes it for the people doing the work. Solo developers, team leads, and org admins all live inside one product, because AI usage moves across those scales fluidly. The same person can be a solo developer on Saturday, a team lead on Monday, and an admin reviewing spend on Friday, so the tool should follow the work rather than the org chart.
This case study is not a shipped product. It is an exploration of what practitioner-first AI tooling could look like.
Understanding the Users
The shape of this user came from two places. The first was my own usage. I hit limits mid-task. I had no idea which prompts cost more than others. I suspected I could work more efficiently, but no tool would tell me how.
The second was an ongoing conversation with a friend, about how he and his team navigate model choice, usage caps, and the tradeoffs between throughput and cost. While our domains differed, the friction was the same.
Those two inputs converged on a recognizable user. I built a composite persona to anchor the design conversations.
Maya
Senior Product Designer
Team size
6
|
Profile
|
Behaviors Uses Claude across multiple workflows daily
Switches between writing, prototyping, and research
Works closely with a small design ops team
|
||||||
|
Goals +Stay in flow without breaking concentration
+Understand her own work rhythms and patterns
+Work efficiently without feeling watched
|
Pain points –Hits usage limits without warning
–No insight into prompt-level cost
–Unclear visibility from her team lead
|
Maya, Senior Product Designer, DesignOps team of six. Composite persona built from firsthand usage and peer interviews.
Help Before Oversight
Most usage dashboards default to surveillance. They give managers a view into their reports' work and frame it as productivity data. The metrics are real. The design is structured around the manager's anxiety.
Tally takes a different position. When it surfaces an individual, it surfaces them by helping them first.
Privacy boundary diagram. Tally detects a pattern, then sends two signals in parallel: the fix to the individual, and a status notification to the lead. The two streams never converge.
The same line appears on both surfaces, in the same words. On the designer's view:
Your team lead can see this tip was sent. They can't see what it says.
On the team lead's view:
You can see that tips were sent. You can't see what they say.
The boundary is a real constraint of the product, written into the interface so both sides can see it.
Two conditions have to be true for this to work:
The help has to actually be good. A vague nag delivered with respect is still a nag. Tips are specific, grounded in the user's actual work, and bounded to one suggestion and one fix.
The user has to control the system. Frequency, channels, opt-out - all of it lives in settings the user owns, not the lead.
A team lead pushing back might reasonably say: I need to know about a problem before my report does, so I can manage it. Tally's answer is that the problem is being managed by the tool, with better technical suggestions than most managers would give, and with a record of whether the help is working. The lead's job shifts from initiating action to observing whether help is landing.
The Solution
Tally serves three contexts - solo, team, and org - through views that share the same structure. Three questions appear in the same order at every scale:
How is the work going?
What are we building?
What is it costing?
1. Solo view
Quality on top. Cache hit rate, model fit, and latency are the headline. Spend lives in the footer.
A heatmap colored by model mix. The colors trace someone's actual day with the model: Opus-heavy in the morning, Sonnet-led in the afternoon.
Project tags roll up. This shows where the attention actually went.
tally
Adam · last 30 days
Working well
Cache hit rate
68%
↑ 9 pts vs prev
Model fit
Healthy
Sonnet leads, balanced mix
Avg latency
1.2s
No regressions
This month's work
When you build
Color shows model mix
By project
From your request tags
What it cost
30-day spend
$47.23
$18.40 saved by cache
Pace
~$1.57/day
↓ 12% vs prev 30d
Solo view. Quality metrics lead, the heatmap surfaces model mix across the day, project breakdown sits below, and spend lives in the footer.
2. Team Lead view
tally
Design systems team · 6 people · 30 days
Team is working well
Cache hit rate
67%
5 of 6 above 60%
Model fit
Healthy
Sonnet-led across team
Help in flight
2
Tips sent this week
Tally sent Maya a tip about caching in mobile-audit
Awaiting review · 2 days ago
Tally sent Sam a tip about model selection on react-reconfig
Awaiting review · 4 days ago
You can see that tips were sent. You can't see what they say.
What we're building
By project
From request tags
How the team is doing
Adam G.
Design Ops Lead
Dana C.
Design Ops Program Manager
Jordan L.
Senior Design Technologist
Maya R.
Design Systems Designer
Priya T.
Senior Design Technologist
Sam K.
Design Systems Designer
What it cost
30-day spend
$412.80
↓ 8% vs prev 30d
Pace
~$13.76/day
$2.29 per person/day
Saved by cache
$163.40
28% of theoretical spend
Team Lead view. Each row leads with status rather than a metric. "Help in flight" appears at the top as a count of tips sent this week.
Status leads, metrics follow. Each row reads "Working well," "Tip sent, awaiting review," or "Tip applied last week." The numbers sit underneath as context.
"Help in flight" is the headline. A count of tips Tally surfaced this week sits at the top - the first thing the lead sees when they open the page.
3. Org Admin view
tally
Northwind Co. · 8 teams · 64 people · 30 days
Org is working well
Cache hit rate
71%
7 of 8 teams above 60%
Adoption
Steady
52 of 64 active this month
Help in flight
3 teams
Tips sent to leads this week
Worth a look
Marketing team's spend tripled this week
From $84 to $261, mostly on a new campaign-copy workspace · Worth a check-in
Engineering's cache hit rate jumped 14 points
Likely from prompt refactor on Apr 12 · Worth surfacing as a pattern
How teams are doing
Design Systems
Adam G. leads
Engineering
Talia W. leads
Marketing
Rae P. leads
Product
Jin H. leads
Research
Marcus E. leads
Sales
Lina O. leads
Support
Owen B. leads
People Ops
Hadley K. leads
What it cost
30-day spend
$4,455
↑ 6% vs prev 30d
Projected month
$4,820
Of $6,000 budget
Per person
$85.67
Across 52 active users
Saved by cache
$1,720
28% of theoretical spend
Monthly pace against budget
$6,000 ceiling
Org Admin view. The People table has become a Teams table. "Worth a look" surfaces both spend spikes and cache hit rate jumps in the same calm voice.
The architecture resists calling out individuals. The People table becomes a Teams table.
"Worth a look" surfaces good news too. A cache hit rate jump shows green and a spend spike shows amber, with the same typographic weight in both directions.
4. The Notification Maya Receives
When Tally surfaces something, the message has three jobs:
Tell her what was noticed.
Show her the fix.
State the privacy boundary, in plain English.
That last one sits at the bottom of every tip - in both the in-app version and the email - where the person being talked about can read it.
Your team lead can see this tip was sent. They can't see what it says
tally
Tips for Maya
Active
Tally noticed something in mobile-audit
2 days agoThree of your recent prompts share an 800-token preamble - the same character setup and tone instructions. Caching that prefix would cut their cost by about 70% and run them faster.
What this looks like
You'd add cache_control: ephemeral to the system message in those three calls. Anthropic's docs walk through the change.
Your team lead can see this tip was sent. They can't see what it says.
Past tips
How tally helps you
Send me tips
When Tally spots something worth a one-time fix
Email me a copy
Same tip in your inbox so you can act on it later
Frequency
At most one tip every two weeks
In-app version. The tip leads with what Tally noticed, shows the fix in concrete terms, and ends with the privacy boundary line.
Email version. Same three jobs, restructured for inbox reading. The privacy boundary line sits in the footer.
Reflection
The project came out of a question I kept hitting in my own AI usage and in conversations with other designers and engineers. The tools meant to help us understand our AI work were reporting on us instead. I wanted to see what an alternative looked like at the level of a real product.
The help-first constraint did more work than I expected. Once "leads see that a tip went out, never what it said" became the rule, the team view stopped functioning as a dashboard and became something closer to a coaching tool.