Tally: a Usage Tool for People Using AI, Adam George — Adam George, Senior Product Designer, Atlanta, GA

tally

A usage tool for the people using AI

  
      My Role:
      Product Designer
    
      Company:
      Speculative project
    
      Project:
      Tally, a third-party observability product for Anthropic API usage
    
      Methods:
      Speculative design, product architecture, interaction design, UX writing
    
      Deliverables:
      Three-view dashboard system and in-app/email tip flows

We see less of our AI usage than almost anything else we do at work. We get monthly reports on cloud spend, weekly metrics on shipped features, and dashboards with real-time metrics like traffic and revenue. But the tool we’re reaching for throughout the day is mostly invisible until the bill arrives.

Tally is a speculative design exploration of what closing that gap could look like.

Conducted in May 2026

What is Tally?

I started this project after a week of hitting my own Claude usage limits an hour into what I'd call light work. That's the problem at the scale of one person. Scale it to a team and it compounds: three designers on a six-person DesignOps team can hit their limits before lunch, and the team lead has no more insight into why than the designers do.

Existing tools in this space are built for the bill payer. The practitioner gets a fraction of what they'd need to work more efficiently.

Tally is a third-party observability product for Anthropic API usage. It sits on top of the data Anthropic already exposes and reshapes it for the people doing the work. Solo developers, team leads, and org admins all live inside one product, because AI usage moves across those scales fluidly. The same person can be a solo developer on Saturday, a team lead on Monday, and an admin reviewing spend on Friday, so the tool should follow the work rather than the org chart.

This case study is not a shipped product. It is an exploration of what practitioner-first AI tooling could look like.

Understanding the Users

The shape of this user came from two places. The first was my own usage. I hit limits mid-task. I had no idea which prompts cost more than others. I suspected I could work more efficiently, but no tool would tell me how.

The second was an ongoing conversation with a friend, about how he and his team navigate model choice, usage caps, and the tradeoffs between throughput and cost. While our domains differed, the friction was the same.

Those two inputs converged on a recognizable user. I built a composite persona to anchor the design conversations.

M

Maya

Senior Product Designer

Team size

6

Profile

Team	DesignOps
Frequency	Daily
Use cases	Writing, prototyping, research

Behaviors

Uses Claude across multiple workflows daily

Switches between writing, prototyping, and research

Works closely with a small design ops team

Goals

+Stay in flow without breaking concentration

+Understand her own work rhythms and patterns

+Work efficiently without feeling watched

Pain points

–Hits usage limits without warning

–No insight into prompt-level cost

–Unclear visibility from her team lead

Maya, Senior Product Designer, DesignOps team of six. Composite persona built from firsthand usage and peer interviews.

Help Before Oversight

Most usage dashboards default to surveillance. They give managers a view into their reports' work and frame it as productivity data. The metrics are real. The design is structured around the manager's anxiety.

Tally takes a different position. When it surfaces an individual, it surfaces them by helping them first.

Privacy boundary diagram. Tally detects a pattern, then sends two signals in parallel: the fix to the individual, and a status notification to the lead. The two streams never converge.

The same line appears on both surfaces, in the same words. On the designer's view:

Your team lead can see this tip was sent. They can't see what it says.

On the team lead's view:

You can see that tips were sent. You can't see what they say.

The boundary is a real constraint of the product, written into the interface so both sides can see it.

Two conditions have to be true for this to work:

The help has to actually be good. A vague nag delivered with respect is still a nag. Tips are specific, grounded in the user's actual work, and bounded to one suggestion and one fix.
The user has to control the system. Frequency, channels, opt-out - all of it lives in settings the user owns, not the lead.

A team lead pushing back might reasonably say: I need to know about a problem before my report does, so I can manage it. Tally's answer is that the problem is being managed by the tool, with better technical suggestions than most managers would give, and with a record of whether the help is working. The lead's job shifts from initiating action to observing whether help is landing.

The Solution

Tally serves three contexts - solo, team, and org - through views that share the same structure. Three questions appear in the same order at every scale:

How is the work going?
What are we building?
What is it costing?

1. Solo view

Quality on top. Cache hit rate, model fit, and latency are the headline. Spend lives in the footer.
A heatmap colored by model mix. The colors trace someone's actual day with the model: Opus-heavy in the morning, Sonnet-led in the afternoon.
Project tags roll up. This shows where the attention actually went.

tally

Adam · last 30 days

7d 30d 90d

Working well

Cache hit rate

68%

↑ 9 pts vs prev

Model fit

Healthy

Sonnet leads, balanced mix

Avg latency

1.2s

No regressions

This month's work

When you build

Color shows model mix

Quiet Sonnet-led Opus-heavy

By project

From your request tags

documentation-v2412 requests 62%

mobile-audit186 requests 22%

react-reconfig94 requests 11%

untagged42 requests 5%

What it cost

30-day spend

$47.23

$18.40 saved by cache

Pace

~$1.57/day

↓ 12% vs prev 30d

Solo view. Quality metrics lead, the heatmap surfaces model mix across the day, project breakdown sits below, and spend lives in the footer.

2. Team Lead view

tally

Design systems team · 6 people · 30 days

7d 30d 90d

Team is working well

Cache hit rate

67%

5 of 6 above 60%

Model fit

Healthy

Sonnet-led across team

Help in flight

2

Tips sent this week

Tally sent Maya a tip about caching in mobile-audit

Awaiting review · 2 days ago

View status ↗

Tally sent Sam a tip about model selection on react-reconfig

Awaiting review · 4 days ago

View status ↗

You can see that tips were sent. You can't see what they say.

What we're building

By project

From request tags

design-system-v31,840 requests · 4 contributors 48%

mobile-audit920 requests · 2 contributors 24%

react-reconfig680 requests · 3 contributors 18%

research-spikes410 requests · 5 contributors 10%

How the team is doing

        Person
        Cache
        Top model
        Status
        
      

AG

Adam G.

Design Ops Lead

72% Sonnet Working well ↗

DC

Dana C.

Design Ops Program Manager

64% Haiku Working well ↗

JL

Jordan L.

Senior Design Technologist

78% Sonnet Working well ↗

MR

Maya R.

Design Systems Designer

22% Opus Tip sent · awaiting review ↗

PT

Priya T.

Senior Design Technologist

71% Sonnet Tip applied last week ↗

SK

Sam K.

Design Systems Designer

69% Sonnet Tip sent · awaiting review ↗

What it cost

30-day spend

$412.80

↓ 8% vs prev 30d

Pace

~$13.76/day

$2.29 per person/day

Saved by cache

$163.40

28% of theoretical spend

Team Lead view. Each row leads with status rather than a metric. "Help in flight" appears at the top as a count of tips sent this week.

Status leads, metrics follow. Each row reads "Working well," "Tip sent, awaiting review," or "Tip applied last week." The numbers sit underneath as context.
"Help in flight" is the headline. A count of tips Tally surfaced this week sits at the top - the first thing the lead sees when they open the page.

3. Org Admin view

tally

Northwind Co. · 8 teams · 64 people · 30 days

7d 30d 90d

Org is working well

Cache hit rate

71%

7 of 8 teams above 60%

Adoption

Steady

52 of 64 active this month

Help in flight

3 teams

Tips sent to leads this week

Worth a look

Marketing team's spend tripled this week

From $84 to $261, mostly on a new campaign-copy workspace · Worth a check-in

Investigate ↗

Engineering's cache hit rate jumped 14 points

Likely from prompt refactor on Apr 12 · Worth surfacing as a pattern

See change ↗

How teams are doing

        Team
        People
        Cache
        Spend
        Status
        
      

Design Systems

Adam G. leads

6 67% $412 Help in flight ↗

Engineering

Talia W. leads

14 82% $1,840 Working well ↗

Marketing

Rae P. leads

8 74% $261 Spike worth a look ↗

Product

Jin H. leads

9 69% $682 Working well ↗

Research

Marcus E. leads

5 72% $394 Help in flight ↗

Sales

Lina O. leads

7 64% $208 Working well ↗

Support

Owen B. leads

11 76% $540 Help in flight ↗

People Ops

Hadley K. leads

4 54% $118 Onboarding still ↗

What it cost

30-day spend

$4,455

↑ 6% vs prev 30d

Projected month

$4,820

Of $6,000 budget

Per person

$85.67

Across 52 active users

Saved by cache

$1,720

28% of theoretical spend

Monthly pace against budget

$6,000 ceiling

        $0
        $4,820 projected
        $6,000
      

Org Admin view. The People table has become a Teams table. "Worth a look" surfaces both spend spikes and cache hit rate jumps in the same calm voice.

The architecture resists calling out individuals. The People table becomes a Teams table.
"Worth a look" surfaces good news too. A cache hit rate jump shows green and a spend spike shows amber, with the same typographic weight in both directions.

4. The Notification Maya Receives

When Tally surfaces something, the message has three jobs:

Tell her what was noticed.
Show her the fix.
State the privacy boundary, in plain English.

That last one sits at the bottom of every tip - in both the in-app version and the email - where the person being talked about can read it.

Your team lead can see this tip was sent. They can't see what it says

tally

Tips for Maya

Dashboard Tips Settings

Active

Tally noticed something in mobile-audit

2 days ago

Three of your recent prompts share an 800-token preamble - the same character setup and tone instructions. Caching that prefix would cut their cost by about 70% and run them faster.

What this looks like

You'd add cache_control: ephemeral to the system message in those three calls. Anthropic's docs walk through the change.

Your team lead can see this tip was sent. They can't see what it says.

Past tips

Sonnet would handle most of your react-reconfig work

Sent Apr 14

Applied ↗

A long system message could move to the user turn

Sent Apr 2

Not useful ↗

Caching opportunity in your design-system-v3 prompts

Sent Mar 21

Applied ↗

How tally helps you

Send me tips

When Tally spots something worth a one-time fix

Email me a copy

Same tip in your inbox so you can act on it later

Frequency

At most one tip every two weeks

Two weeks ↓

In-app version. The tip leads with what Tally noticed, shows the fix in concrete terms, and ends with the privacy boundary line.

From 9:14 AM

tally <tips@tally.app>

A small caching change in mobile-audit

Hi Maya,

Three of your recent prompts share an 800-token preamble. Same character setup, same tone instructions. Caching that prefix would cut their cost by about 70% and run them faster.

What this looks like: add cache_control: ephemeral to the system message in those three calls. Anthropic's docs walk through it.

Show me how Not useful

Tally watches for prompts that share long prefixes and aren't using cache yet. You'll see at most one of these every two weeks. Change that here.

- Tally

Your team lead can see this tip was sent to you. They can't see what it says. More on how this works.

Email version. Same three jobs, restructured for inbox reading. The privacy boundary line sits in the footer.

Reflection

The project came out of a question I kept hitting in my own AI usage and in conversations with other designers and engineers. The tools meant to help us understand our AI work were reporting on us instead. I wanted to see what an alternative looked like at the level of a real product.

The help-first constraint did more work than I expected. Once "leads see that a tip went out, never what it said" became the rule, the team view stopped functioning as a dashboard and became something closer to a coaching tool.