tally

A usage tool for the people using AI

My Role: Product Designer
Company: Speculative project
Project: Tally, a third-party observability product for Anthropic API usage
Methods: Speculative design, product architecture, interaction design, UX writing
Deliverables: Three-view dashboard system and in-app/email tip flows

We see less of our AI usage than almost anything else we do at work. We get monthly reports on cloud spend, weekly metrics on shipped features, and dashboards with real-time metrics like traffic and revenue. But the tool we’re reaching for throughout the day is mostly invisible until the bill arrives.

Tally is a speculative design exploration of what closing that gap could look like.

Conducted in May 2026

What is Tally?

I started this project after a week of hitting my own Claude usage limits an hour into what I'd call light work. That's the problem at the scale of one person. Scale it to a team and it compounds: three designers on a six-person DesignOps team can hit their limits before lunch, and the team lead has no more insight into why than the designers do.

Existing tools in this space are built for the bill payer. The practitioner gets a fraction of what they'd need to work more efficiently.

Tally is a third-party observability product for Anthropic API usage. It sits on top of the data Anthropic already exposes and reshapes it for the people doing the work. Solo developers, team leads, and org admins all live inside one product, because AI usage moves across those scales fluidly. The same person can be a solo developer on Saturday, a team lead on Monday, and an admin reviewing spend on Friday, so the tool should follow the work rather than the org chart.

This case study is not a shipped product. It is an exploration of what practitioner-first AI tooling could look like.

Understanding the Users

The shape of this user came from two places. The first was my own usage. I hit limits mid-task. I had no idea which prompts cost more than others. I suspected I could work more efficiently, but no tool would tell me how.

The second was an ongoing conversation with a friend, about how he and his team navigate model choice, usage caps, and the tradeoffs between throughput and cost. While our domains differed, the friction was the same.

Those two inputs converged on a recognizable user. I built a composite persona to anchor the design conversations.

M

Maya

Senior Product Designer

Team size

6

TeamDesignOps
FrequencyDaily
Use casesWriting, prototyping, research
Uses Claude across multiple workflows daily
Switches between writing, prototyping, and research
Works closely with a small design ops team
+Stay in flow without breaking concentration
+Understand her own work rhythms and patterns
+Work efficiently without feeling watched
Hits usage limits without warning
No insight into prompt-level cost
Unclear visibility from her team lead

Maya, Senior Product Designer, DesignOps team of six. Composite persona built from firsthand usage and peer interviews.

Help Before Oversight

Most usage dashboards default to surveillance. They give managers a view into their reports' work and frame it as productivity data. The metrics are real. The design is structured around the manager's anxiety.

Tally takes a different position. When it surfaces an individual, it surfaces them by helping them first.

Tally pattern detection and parallel notification flow Three-lane flow diagram showing how Tally handles pattern detection. Top lane: Tally detects a pattern (example: prompts could be cached) and sends signals simultaneously to two recipients. Middle lane, separated by a privacy boundary: Maya the designer receives a personalized tip with concrete suggestions, while Adam the team lead sees that a tip was sent and whether it landed. Bottom lane shows outcomes: Maya acts on the suggestion with tip contents staying private; Adam gets visibility without surveillance, knowing it happened but not what was said. TALLY RECIPIENTS OUTCOME Pattern detected e.g. prompts could be cached Sent simultaneously MR Maya — Designer Personalized tip with concrete suggestions AG Adam — Team lead Sees that a tip was sent and whether it landed Acts on the suggestion Tip contents stay private Visibility, not surveillance Knows it happened, not what PRIVACY BOUNDARY

Privacy boundary diagram. Tally detects a pattern, then sends two signals in parallel: the fix to the individual, and a status notification to the lead. The two streams never converge.

The same line appears on both surfaces, in the same words. On the designer's view:

Your team lead can see this tip was sent. They can't see what it says.

On the team lead's view:

You can see that tips were sent. You can't see what they say.

The boundary is a real constraint of the product, written into the interface so both sides can see it.

Two conditions have to be true for this to work:

  • The help has to actually be good. A vague nag delivered with respect is still a nag. Tips are specific, grounded in the user's actual work, and bounded to one suggestion and one fix.

  • The user has to control the system. Frequency, channels, opt-out - all of it lives in settings the user owns, not the lead.

A team lead pushing back might reasonably say: I need to know about a problem before my report does, so I can manage it. Tally's answer is that the problem is being managed by the tool, with better technical suggestions than most managers would give, and with a record of whether the help is working. The lead's job shifts from initiating action to observing whether help is landing.

The Solution

Tally serves three contexts - solo, team, and org - through views that share the same structure. Three questions appear in the same order at every scale:

  • How is the work going?

  • What are we building?

  • What is it costing?

1. Solo view

  • Quality on top. Cache hit rate, model fit, and latency are the headline. Spend lives in the footer.

  • A heatmap colored by model mix. The colors trace someone's actual day with the model: Opus-heavy in the morning, Sonnet-led in the afternoon.

  • Project tags roll up. This shows where the attention actually went.

tally

Adam · last 30 days

7d 30d 90d

Working well

Cache hit rate

68%

↑ 9 pts vs prev

Model fit

Healthy

Sonnet leads, balanced mix

Avg latency

1.2s

No regressions

This month's work

When you build

Color shows model mix

12a 6a 12p 6p 11p
Quiet Sonnet-led Opus-heavy

By project

From your request tags

documentation-v2412 requests 62%
mobile-audit186 requests 22%
react-reconfig94 requests 11%
untagged42 requests 5%

What it cost

30-day spend

$47.23

$18.40 saved by cache

Pace

~$1.57/day

↓ 12% vs prev 30d

Solo view. Quality metrics lead, the heatmap surfaces model mix across the day, project breakdown sits below, and spend lives in the footer.

2. Team Lead view

tally

Design systems team · 6 people · 30 days

7d 30d 90d

Team is working well

Cache hit rate

67%

5 of 6 above 60%

Model fit

Healthy

Sonnet-led across team

Help in flight

2

Tips sent this week

Tally sent Maya a tip about caching in mobile-audit

Awaiting review · 2 days ago

View status ↗

Tally sent Sam a tip about model selection on react-reconfig

Awaiting review · 4 days ago

View status ↗

You can see that tips were sent. You can't see what they say.

What we're building

By project

From request tags

design-system-v31,840 requests · 4 contributors 48%
mobile-audit920 requests · 2 contributors 24%
react-reconfig680 requests · 3 contributors 18%
research-spikes410 requests · 5 contributors 10%

How the team is doing

Person Cache Top model Status
AG

Adam G.

Design Ops Lead

72% Sonnet Working well
DC

Dana C.

Design Ops Program Manager

64% Haiku Working well
JL

Jordan L.

Senior Design Technologist

78% Sonnet Working well
MR

Maya R.

Design Systems Designer

22% Opus Tip sent · awaiting review
PT

Priya T.

Senior Design Technologist

71% Sonnet Tip applied last week
SK

Sam K.

Design Systems Designer

69% Sonnet Tip sent · awaiting review

What it cost

30-day spend

$412.80

↓ 8% vs prev 30d

Pace

~$13.76/day

$2.29 per person/day

Saved by cache

$163.40

28% of theoretical spend

Team Lead view. Each row leads with status rather than a metric. "Help in flight" appears at the top as a count of tips sent this week.

  • Status leads, metrics follow. Each row reads "Working well," "Tip sent, awaiting review," or "Tip applied last week." The numbers sit underneath as context.

  • "Help in flight" is the headline. A count of tips Tally surfaced this week sits at the top - the first thing the lead sees when they open the page.

3. Org Admin view

tally

Northwind Co. · 8 teams · 64 people · 30 days

7d 30d 90d

Org is working well

Cache hit rate

71%

7 of 8 teams above 60%

Adoption

Steady

52 of 64 active this month

Help in flight

3 teams

Tips sent to leads this week

Worth a look

Marketing team's spend tripled this week

From $84 to $261, mostly on a new campaign-copy workspace · Worth a check-in

Investigate ↗

Engineering's cache hit rate jumped 14 points

Likely from prompt refactor on Apr 12 · Worth surfacing as a pattern

See change ↗

How teams are doing

Team People Cache Spend Status

Design Systems

Adam G. leads

6 67% $412 Help in flight

Engineering

Talia W. leads

14 82% $1,840 Working well

Marketing

Rae P. leads

8 74% $261 Spike worth a look

Product

Jin H. leads

9 69% $682 Working well

Research

Marcus E. leads

5 72% $394 Help in flight

Sales

Lina O. leads

7 64% $208 Working well

Support

Owen B. leads

11 76% $540 Help in flight

People Ops

Hadley K. leads

4 54% $118 Onboarding still

What it cost

30-day spend

$4,455

↑ 6% vs prev 30d

Projected month

$4,820

Of $6,000 budget

Per person

$85.67

Across 52 active users

Saved by cache

$1,720

28% of theoretical spend

Monthly pace against budget

$6,000 ceiling

$0 $4,820 projected $6,000

Org Admin view. The People table has become a Teams table. "Worth a look" surfaces both spend spikes and cache hit rate jumps in the same calm voice.

  • The architecture resists calling out individuals. The People table becomes a Teams table.

  • "Worth a look" surfaces good news too. A cache hit rate jump shows green and a spend spike shows amber, with the same typographic weight in both directions.

4. The Notification Maya Receives

When Tally surfaces something, the message has three jobs:

  1. Tell her what was noticed.

  2. Show her the fix.

  3. State the privacy boundary, in plain English.

That last one sits at the bottom of every tip - in both the in-app version and the email - where the person being talked about can read it.

Your team lead can see this tip was sent. They can't see what it says

tally

Tips for Maya

Dashboard Tips Settings

Active

Tally noticed something in mobile-audit

2 days ago

Three of your recent prompts share an 800-token preamble - the same character setup and tone instructions. Caching that prefix would cut their cost by about 70% and run them faster.

What this looks like

You'd add cache_control: ephemeral to the system message in those three calls. Anthropic's docs walk through the change.

Your team lead can see this tip was sent. They can't see what it says.

Past tips

Sonnet would handle most of your react-reconfig work

Sent Apr 14

Applied

A long system message could move to the user turn

Sent Apr 2

Not useful

Caching opportunity in your design-system-v3 prompts

Sent Mar 21

Applied

How tally helps you

Send me tips

When Tally spots something worth a one-time fix

Email me a copy

Same tip in your inbox so you can act on it later

Frequency

At most one tip every two weeks

Two weeks ↓

In-app version. The tip leads with what Tally noticed, shows the fix in concrete terms, and ends with the privacy boundary line.

From 9:14 AM

tally <tips@tally.app>

A small caching change in mobile-audit

Hi Maya,

Three of your recent prompts share an 800-token preamble. Same character setup, same tone instructions. Caching that prefix would cut their cost by about 70% and run them faster.

What this looks like: add cache_control: ephemeral to the system message in those three calls. Anthropic's docs walk through it.

Tally watches for prompts that share long prefixes and aren't using cache yet. You'll see at most one of these every two weeks. Change that here.

- Tally

Your team lead can see this tip was sent to you. They can't see what it says. More on how this works.

Email version. Same three jobs, restructured for inbox reading. The privacy boundary line sits in the footer.

Reflection

The project came out of a question I kept hitting in my own AI usage and in conversations with other designers and engineers. The tools meant to help us understand our AI work were reporting on us instead. I wanted to see what an alternative looked like at the level of a real product.

The help-first constraint did more work than I expected. Once "leads see that a tip went out, never what it said" became the rule, the team view stopped functioning as a dashboard and became something closer to a coaching tool.