aiclaude-codeproject-managementworkflowdeveloper-experienceskills

I built an AI project manager that tells me no

February 10, 2026

My tech-lead skill solved the "Claude builds whatever it thinks you meant" problem. It asked questions before coding. It pushed back on bad ideas. But after using it for a few weeks, I noticed I was still the one managing everything around it. Remembering where we left off. Deciding which agents to spawn. Keeping scope in check. I was doing project management while pretending I wasn't.

So I built an actual project manager. A skill that owns the whole lifecycle from "I have a vague idea" to "here's the reviewed, implemented result." And the most important thing it does is something AI almost never does: it tells me no.

PMs get a bad rap

I should mention: I've never worked as a developer. Not professionally, anyway. My career has been 15+ years in IT hardware service - lead technician at a major Swedish IT consultancy, fixing client machines, managing SLAs, running a service workshop. The PMs I've encountered weren't in sprint plannings. They were in service delivery. Operational meetings with municipal clients. SLA reviews where someone had to explain why turnaround times slipped.

Most technicians I worked with saw PMs as a necessary evil. The ones who added process on top of process, scheduled meetings about meetings, and asked you to fill out forms that nobody would ever read. The ones who managed by spreadsheet and treated the actual work as an abstraction on a Gantt chart.

I get it. I've sat in those meetings. I've filled out those forms while quietly dying inside.

But I also worked with PMs who were different. The ones who actually sat down in the workshop, asked what was slowing us down, and then went and fixed the blocker instead of creating a new tracking sheet for it. Who caught scope creep before it ate the timeline. Who asked "do we actually need this process?" when everyone else was too comfortable to question it. They didn't manage by spreadsheet. They managed by understanding your pain points and clearing the path.

I built my AI project manager after that second kind. Not a gatekeeper. Not a process machine. A collaborator that happens to have the discipline I lack when I'm three coffees deep and convinced that "just one more feature" won't blow the scope.

Four phases, one rule

The PM runs a lifecycle: Discovery → Plan → Execute → Review. Each phase has a specific job, and it won't skip ahead. You can't say "just build it" and have it jump to execution. It'll push back. Politely. But firmly.

Same way those good PMs wouldn't let you skip the scoping conversation just because you were excited about the solution. The discipline is the feature.

Discovery: the uncomfortable part

This is where most of the value lives. And the shape of discovery depends on what you bring to it.

When you show up with something half-formed, the PM's first job is asking questions you don't want to answer:

plaintext

You: "I want to add a notification system to the app"

PM: "Before we plan anything, I need to understand this better:

- Who specifically gets notified about what? 'Users get notifications'
  is too vague - which users, which events, what's the trigger?
- What happens if we don't build this at all? What breaks?
- You said notification system - are we talking email, push, in-app,
  or all three? Because that's three different projects.
- What's the smallest version that would actually be useful?"

You: "In-app only. When someone comments on your PR stats.
     Users have been asking for it."

PM: "Better. Two more:
- How many users have actually asked vs. how many you think want it?
- What does 'done' look like? When do we stop iterating on this?"

Is it annoying? A little. Is it the conversation you should have before writing a single line of code? Absolutely. I've shipped features that nobody used because I skipped this step. More than once.

After the dialogue, the PM spawns specialist agents in the background. An Explore agent to scan the codebase for existing patterns. A system-architect to flag structural concerns. A security-engineer if the feature touches auth or user data. They run in parallel, and you never see their raw output. The PM synthesizes everything into a coherent picture:

plaintext

PM: "I consulted with specialists. Here's what emerged:

- The codebase already has a notification pattern in lib/events/ that
  uses a pub/sub model. We should follow it, not create a parallel system.
- Architecture concern: the current event bus is synchronous. In-app
  notifications need async delivery or they'll block the PR stats write.
- Security surface is minimal - notifications are read-only and scoped
  to the authenticated user. No special handling needed.

Discovery summary:
- Problem: Users want to know when someone interacts with their stats
- Constraints: Must use existing event bus, needs async delivery
- Risk: Event bus migration could cascade if done wrong
- Open question: Do we need read/unread state or just a feed?
- Recommendation: Build on existing pub/sub, add async wrapper,
  start with a simple feed (no read/unread state yet)"

That last line - "start with a simple feed, no read/unread state yet" - is the PM managing scope before scope becomes a problem. Left to my own devices, I would have built read/unread tracking, notification preferences, email digests, and a kitchen sink. The PM said no. Build the small thing first.

This is what the good PMs do. They don't block you from building. They stop you from building the wrong thing, or too much of the right thing.

The full cycle

Not every project starts with a vague idea. Sometimes you know exactly what needs doing. The PM adapts. Let me show you what a complete cycle looks like.

I pointed it at the authentication system of one of my projects and asked: "Review our auth system. Learn it inside and out and ask yourself: could you show this to Linus Torvalds with a clean conscience?"

Discovery

The PM skipped the uncomfortable questions entirely. The task was clear enough that it went straight to specialist consultation - spawned three agents in parallel: an Explore agent for architecture deep-dive, a security-engineer to think like an attacker, and a code-reviewer for quality and consistency.

I never saw what those agents found. What I got was the PM's synthesis:

plaintext

PM: "Three independent reviewers examined the system. Honest
assessment: grade B-.

Could you show this to Torvalds? Not yet.

What's actually good (12 findings):
- OTP-only auth, timing-safe comparisons, RLS policies...

What's broken (12 findings):
- CRITICAL: Open redirect, hardcoded fallback signing key,
  in-memory rate limiting
- HIGH: Supabase error leaks, missing security headers,
  cron secret passed via query parameter
- MEDIUM: Predictable admin tokens, 'as any' TypeScript bypasses
- LOW: Demo callback issues, IP extraction inconsistency

Root cause: Not architecture. The structure is sound.
It's sloppiness in details and missing hardening layers."

B-minus. Not inflated, not false praise. That's the kind of honesty you need from a reviewer and rarely get from AI. Whether the real Torvalds would actually give a grade or just write four paragraphs explaining why I should consider a different career, we'll never know.

Impact analysis

I could see that some of these fixes would cascade through the codebase. Not because I'm clairvoyant, but because I wrote the blueprints for most of these functions myself. When you've defined the function signatures and data shapes, you tend to notice when someone wants to change the plumbing. So I asked the PM to investigate dependencies before planning anything.

This is where the PM shows some judgment about scale. For smaller investigations, it spawns a couple of focused agents and handles things directly. For this one - twelve findings, multiple interconnected systems - it assembled an actual team. Five agents running in parallel, each tracing one critical finding through the dependency chain. It reported progress as they completed: "Rate limiting analysis done. 3 of 5 complete." Once all five reported back, the PM synthesized the impact:

plaintext

PM: "Dependency analysis complete. Key findings:

Finding 2 (Signing Key): CRON_SECRET serves dual purposes -
cron auth AND cache signing. Separating them means existing
hm_sub_cache cookies invalidate at deploy. 5 min TTL,
auto-refresh. No user-visible breakage.

Finding 3 (Rate Limiting → Redis): Requires async/await
changes in 5 call sites. All sites already in async context -
just needs 'await'. No breaking API changes.

Finding 6 (Cron Secret via Query Param): HIGH RISK.
If removed, all 9 cron jobs stop working. Requires external
coordination with cron-job.org. Deferring to later sprint."

That last one is scope management again. The PM didn't try to fix everything at once. It identified the external dependency, flagged the risk, and deferred it. Same instinct as the notification example, different context.

Planning

With the full picture in hand, the PM split the work into three sprints based on the dependency analysis:

plaintext

Sprint 1 - Direct execution (no external dependencies):
  1.1: Open redirect validation
  1.2: Signing key separation
  1.3: Error message sanitization
  1.4: Security headers
  1.5: Timing-safe cron comparisons

  Dependency map:
  1.1, 1.3 → sequential (same file: auth.ts)
  1.2, 1.4, 1.5 → parallel (independent)

Sprint 2 - Requires Upstash account:
  2.1: Redis rate limiting
  2.2: Admin token nonce
  2.3: CSP report-only (depends on Sprint 1.4 deployed)

Sprint 3 - Requires external coordination:
  3.1: Cron migration to Bearer token (cron-job.org)
  3.2: CSP evaluation after running report-only
  3.3: CSP enforcement (after 3.2 validation)
  3.4: Project-wide error sanitization (optional)

Sprint 3 exists because the PM refused to bundle everything into one release. The cron migration needs external coordination. CSP enforcement needs real-world data first. These are the kind of decisions a PM makes that an eager developer (specifically: me) would skip.

The plan got stored in two places: Serena memory for quick context retrieval, and a markdown file in claudedocs/pm/ for human-readable documentation. If I close my terminal and come back tomorrow, the PM picks up where we left off. "I see we completed tasks 1-3 yesterday. Task 4 is blocked on your Upstash setup. What's the status?"

Session persistence sounds like a small thing until you've lost context for the third time and have to re-explain your entire project to an AI that has no memory of what you were working on.

Execution

I gave the go-ahead for Sprint 1. The PM created a team, assigned tasks to the right specialists, ran what it could in parallel, kept the sequential pieces in order.

A key design choice: I never see raw agent output. The PM acts as a filter. Instead of "Agent-3 completed task-7 with 247 lines of output," I get:

plaintext

PM: "Open redirect and error sanitization are done - handled
sequentially since they touch the same file. Signing key
separation, security headers, and cron comparisons ran in
parallel. All passing.

One thing I caught: 9 cron routes had 2 different patterns for
secret verification. Instead of patching each one, I created a
shared verifyCronSecret() function. Cleaner, and it means future
cron routes get the check for free.

Build passes. Moving to review."

It's the difference between managing a team and watching a build log.

Review

After execution, the PM ran a code-reviewer agent against the changes and verified deliverables against the plan's acceptance criteria. The assessment wasn't "everything is great!" It was honest about what's done, what's left, and what needs attention: "Sprint 1 complete. Core hardening is in place. Sprint 2 is blocked on your Upstash account setup. Sprint 3 plan is saved for when you're ready to coordinate with cron-job.org."

The whole thing - from "review my auth" to Sprint 1 complete with a documented plan for Sprints 2 and 3 - happened in a single session.

Scope creep has rules now

One design decision I'm particularly happy with: quantified scope management. Instead of relying on vibes and willpower, the PM has hard thresholds:

plaintext

Scope increase < 20%:   Accept silently, note in plan
Scope increase 20-50%:  Warning. "This increases scope significantly.
                        The impact is: [specific consequences].
                        Want to proceed?"
Scope increase > 50%:   Full stop. "This is essentially a different
                        project. I recommend we finish the current
                        scope first, then start a new cycle."

This is the "tells me no" part in practice. When I'm mid-project and suddenly think "oh, we should also add email notifications," the PM doesn't just nod and add it to the list. It says: "That's a 40% scope increase. It means adding an email service, templates, delivery tracking, and unsubscribe handling. The current feature ships two phases later. Want to proceed, or should we finish in-app first?"

Nine times out of ten, the answer is "finish in-app first." But without someone explicitly stopping me to do the math, I would have kept adding things until the project was twice the size and half finished.

That's what the good PMs I've worked with did too. Not "no, you can't have that." More like "sure, but here's what it costs. Still want it?" The math usually speaks for itself.

Devil's advocate mode

There's an optional mode I use less often but value enormously. When I say "challenge me on this" or "play devil's advocate," the PM drops the diplomatic tone:

plaintext

You: "I think we should rewrite the sync engine in Rust"

PM (devil's advocate): "Let's examine that.

Your sync takes 17 seconds. You just optimized it from 100.
The bottleneck was database writes, not CPU. Rust won't make
Prisma queries faster.

The actual problems with a rewrite:
- You lose your entire test suite
- Nobody on your team (which is you) has production Rust experience
- The sync code touches 12 integration points
- Your users don't care what language the sync runs in

The strongest case against: you'd spend 2-3 months rewriting
something that already works to gain performance you don't need,
in a language you'd have to learn, while your product roadmap
stalls.

What I'd do instead: if CPU becomes a bottleneck (it hasn't),
extract the hot path into a Rust module via FFI. Keep everything
else in TypeScript."

Ouch. But also: completely correct. Sometimes you need someone to tell you the idea is bad before you spend a month learning that yourself. Default behavior is constructive sounding board. Devil's advocate only activates when you ask. But the ability to flip the switch from "supportive" to "brutally honest" has saved me from more bad ideas than I'd like to admit.

The ecosystem

If you're a Claude Code user, you might be wondering: "Didn't you already have a tech lead that does this?" Fair question. I now have four skills that all push back on me in different ways:

plaintext

Role              What it does                       What it doesn't do
──────────────────────────────────────────────────────────────────────────────
Tech Lead         Technical trade-off analysis        Orchestrate teams
                  "Should we refactor auth?"          Manage scope
                  Reads code, gives opinions          Track across sessions

CPO               Product strategy, prioritization    Touch code. Ever.
                  "Which feature matters most?"       Write implementations
                  Challenges business assumptions     Make technical choices

Requirements      Turns vague ideas into PRDs         Orchestrate anything
Analyst           Stakeholder analysis                Execute implementation
                  Formal requirement specs            Manage ongoing work

Project Manager   Owns the full lifecycle             None of these limits
                  Challenges → Plans → Orchestrates   It does all of it
                  Spawns the others as consultants

The key insight: in Claude Code, skills can be made available as agents. Which means the PM can spawn the tech lead as a team member to get a trade-off analysis, or bring in the CPO for a product perspective. Each one sees the problem through their specific lens and reports back. The PM synthesizes it all.

In practice: I use the tech lead directly for quick technical questions, the CPO for strategic decisions, and the project manager when something needs the full treatment. The PM coordinates the others as needed. I don't have to manage that myself.

The uncomfortable truth

The original workflow had me as the glue. I managed the process, decided when to move between phases, tracked state in my head, and relied on willpower for scope discipline. The project manager enforces all of it. Same process, but I can't skip steps when I'm tired. That's a meaningful difference when it's 11 PM and you just want to ship the feature.

I built the project manager because I'm bad at project management. Not incompetent - I can plan, prioritize, and ship. But I take shortcuts when I'm excited about an idea. I skip discovery when the solution feels obvious. I let scope creep when a feature is fun to build. I forget where I left off between sessions.

The PM doesn't have those problems. It doesn't get excited. It doesn't take shortcuts because it's late and the code is almost working. It doesn't forget. It just follows the process, every time, and asks the questions I should be asking myself.

Which is what the good project managers I've worked with did too. Not the ones who hid behind process. The ones who actually listened, understood what you were trying to build, and then made sure you didn't sabotage yourself along the way.

I just automated the role because my team is me, and I'm a terrible manager of myself.

The project manager skill is built on Claude Code's skill and agent system. It uses Serena for memory persistence, TeamCreate for agent orchestration, and a healthy amount of markdown. The skill file is below if you want to try it yourself.

Download project-manager skill (.md)

Drop it in ~/.claude/skills/project-manager/SKILL.md and you're set. The tech-lead workflow is still how I work for smaller tasks. The PM is for when things get real.

>greentext