Back to Operating & Evolving Live Systems
guide
Reviewed Jan 2026

Documentation-Driven Continuous Refactoring

Learn how to operate, maintain, and evolve live code systems after launch without accumulating technical debt through Markdown documentation as living specifications, working with AI coding tools like Claude Code, Cursor, and others.

For: vibecoding
lovable
cursor
claude code
gemini
codex

Key Takeaways

  • Markdown documentation serves as executable specification and continuous reference for system behavior
  • Refactor in isolation using feature flags, branch-by-abstraction, and modular deployment strategies
  • Continuous small refactorings prevent the "feature trap" where technical debt stalls development
  • Production observability and testing-in-production practices catch issues before they impact users

Markdown-Driven Development with AI

The moment a system goes live, a new challenge begins: how do you continue improving it without breaking what works? This insight explores proven strategies for operating and evolving live code systems through Markdown documentation as living specifications, working effectively with AI coding tools like Claude Code, Cursor, Cody, and GitHub Copilot to maintain and evolve your Next.js, React, or Node.js applications.

The Core Problem: Maintaining Systems Built with AI

When building applications with AI coding assistants like Claude Code, Cursor, or GitHub Copilot, development moves incredibly fast. You can generate entire features in minutes. But this speed creates a unique challenge: how do you maintain context and coherence as your codebase evolves?

Ward Cunningham's technical debt metaphor becomes even more relevant: "Shipping first-time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite. The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt."

With AI-assisted development:

  • Code generation is fast, but context can be lost between sessions
  • AI tools work best with clear, written specifications
  • Documentation becomes the primary way to maintain system knowledge
  • Small changes can compound into confusion without proper organization

The statistics remain sobering:

  • Enterprises spend 41% of IT budgets managing technical debt
  • Teams can reach a standstill where new feature development nearly stops
  • AI-generated code without clear specs can create scattered implementations of the same concept

The solution: Markdown documentation as living specifications that both humans and AI can read, understand, and use to maintain consistency across your codebase.

Principle 1: Markdown as Living Specification for AI & Humans

When working with AI coding tools, your Markdown documentation isn't just helpful—it's the primary interface between your intent and the code that gets generated. Well-organized documentation becomes the source of truth that AI tools reference to understand your system and generate consistent code.

Why Markdown for AI-Driven Development

Markdown is the universal format AI tools understand best:

  • Natural language structure that AI can parse and reason about
  • Clear hierarchy (headings, lists, code blocks) that represents system organization
  • Easy to version control alongside code
  • Human-readable, so both you and AI maintain the same understanding

The Level/GitHub approach: Treat Markdown files as the actual specification that AI tools compile into working code. Your main.md becomes the single source of truth that describes what the system should do.

Documentation Structure for AI Tools

Create a documentation architecture that AI can navigate:

diagram
project/
├── docs/
│   ├── MAIN.md                    # Overall system specification
│   ├── architecture/
│   │   ├── system-overview.md     # High-level architecture
│   │   ├── data-models.md         # Database schema and types
│   │   └── api-design.md          # API contracts and endpoints
│   ├── features/
│   │   ├── authentication.md      # Auth system specification
│   │   ├── payments.md            # Payment processing spec
│   │   └── notifications.md       # Notification system spec
│   ├── operations/
│   │   ├── deployment.md          # How to deploy
│   │   ├── monitoring.md          # What to monitor and why
│   │   └── rollback.md            # Rollback procedures
│   └── decisions/
│       └── ADR-001-database-choice.md  # Architecture Decision Records
├── src/
│   └── ... (actual code)
└── README.md                      # Quick start guide

Key principles for AI-friendly documentation:

  1. One feature, one specification file: Each major feature has dedicated Markdown docs
  2. Link specifications to code: Use relative paths that AI tools can follow
  3. Explicit success criteria: Define what "working correctly" means
  4. Examples everywhere: Show concrete inputs and expected outputs
  5. Update continuously: When code changes, specification changes too

Spec-Driven Development Pattern with AI

This pattern treats Markdown as the primary source code that AI tools "compile" into working applications.

Example: Feature Specification for AI (features/authentication.md)

markdown
# User Authentication System

## Purpose
Provide secure user authentication with email/password and OAuth providers (Google, GitHub).

## Current Implementation
- Email/password authentication using bcrypt for hashing
- JWT tokens for session management (24-hour expiry)
- Refresh tokens stored in database
- Route protection via middleware

## Architecture
Insight image
markdown
## API Endpoints

### POST /api/auth/register
Creates new user account.

**Request:**
```json
{
  "email": "user@example.com",
  "password": "SecurePass123!",
  "name": "John Doe"
}
```

**Response (Success):**
```json
{
  "user": {
    "id": "usr_abc123",
    "email": "user@example.com",
    "name": "John Doe"
  },
  "token": "eyJhbGciOiJIUzI1NiIs...",
  "refreshToken": "ref_xyz789"
}
```

**Response (Error):**
```json
{
  "error": "Email already exists"
}
```

### POST /api/auth/login
Authenticates existing user.

**Request:**
```json
{
  "email": "user@example.com",
  "password": "SecurePass123!"
}
```

**Response:** Same as register endpoint.

## Database Schema

### users table
```sql
CREATE TABLE users (
  id VARCHAR(255) PRIMARY KEY,
  email VARCHAR(255) UNIQUE NOT NULL,
  password_hash VARCHAR(255) NOT NULL,
  name VARCHAR(255),
  auth_provider VARCHAR(50) DEFAULT 'email',
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_users_email ON users(email);
```

### refresh_tokens table
```sql
CREATE TABLE refresh_tokens (
  id VARCHAR(255) PRIMARY KEY,
  user_id VARCHAR(255) NOT NULL,
  token VARCHAR(500) NOT NULL,
  expires_at TIMESTAMP NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

CREATE INDEX idx_refresh_tokens_user_id ON refresh_tokens(user_id);
CREATE INDEX idx_refresh_tokens_token ON refresh_tokens(token);
```

## Implementation Requirements

### Password Security
- Use bcrypt with salt rounds = 12
- Enforce minimum password length: 8 characters
- Require at least: 1 uppercase, 1 lowercase, 1 number
- Hash passwords before storing (NEVER store plain text)

### Token Management
- JWT tokens expire after 24 hours
- Refresh tokens expire after 30 days
- Include user ID and email in JWT payload
- Sign tokens with environment variable JWT_SECRET

### Error Handling
- Return 400 for invalid input
- Return 401 for invalid credentials
- Return 409 for duplicate email
- Return 500 for server errors
- Log all errors with stack traces

### Security Requirements
- Rate limit: 5 login attempts per 15 minutes per IP
- Sanitize all user inputs
- Use HTTPS in production
- Set secure, httpOnly cookies for tokens
- Implement CSRF protection

## Testing Requirements

### Unit Tests
- Password hashing/validation
- Token generation/validation
- Input sanitization

### Integration Tests
- Complete registration flow
- Complete login flow
- Token refresh flow
- Invalid credentials handling
- Rate limiting behavior

### Test Data
```javascript
const validUser = {
  email: "test@example.com",
  password: "TestPass123!",
  name: "Test User"
};

const invalidPasswords = [
  "short",           // Too short
  "NoNumbers!",      // No numbers
  "nonumbers123",    // No uppercase
  "NOLOWERCASE123",  // No lowercase
];
```

## Future Enhancements (Not Current Scope)
- [ ] OAuth integration (Google, GitHub)
- [ ] Two-factor authentication (TOTP)
- [ ] Password reset via email
- [ ] Social login providers
- [ ] Session management dashboard

## Related Documentation
- [API Design Patterns](../architecture/api-design.md)
- [Database Schema](../architecture/data-models.md)
- [Security Requirements](../architecture/security.md)

## AI Tool Guidance
When implementing or modifying this feature:
1. Read this entire specification first
2. Generate code that matches the API contracts exactly
3. Include all error handling cases
4. Write tests before implementing functionality
5. Update this document if implementation details change

Working with AI Coding Tools

Pattern for new features:

  1. Write specification in Markdown first (like above example)
  2. Pass specification to AI tool (Claude Code, Cursor, etc.)
  3. AI generates code matching specification
  4. Review and test generated code
  5. Update specification if requirements change

Example AI interaction:

markdown
You: "Read docs/features/authentication.md and implement the 
registration endpoint according to the specification."

AI: [Reads specification, generates code]
- Creates /api/auth/register endpoint
- Implements password validation
- Adds bcrypt hashing
- Creates database queries
- Includes error handling
- Writes unit tests

You: "Now implement the login endpoint."

AI: [Reads same specification]
- Creates /api/auth/login endpoint  
- Reuses password validation logic
- Implements token generation
- Includes rate limiting
- Writes integration tests

The AI maintains consistency because both endpoints reference the same specification document.

Living Documentation Pattern

Documentation should reflect current system state at all times.

Code-to-Docs connection:

javascript
/**
 * User Authentication Service
 * 
 * @see docs/features/authentication.md for full specification
 * @see docs/architecture/security.md for security requirements
 * 
 * This service implements the authentication system as specified.
 * Any changes to auth behavior should update the specification first.
 */
export class AuthService {
  /**
   * Register new user
   * @see docs/features/authentication.md#post-apiauthregister
   */
  async register(email: string, password: string, name: string) {
    // Implementation matches specification
  }
  
  /**
   * Authenticate user
   * @see docs/features/authentication.md#post-apiauthlogin
   */
  async login(email: string, password: string) {
    // Implementation matches specification
  }
}

Docs-to-Code connection (in authentication.md):

markdown
## Implementation

Current implementation: [src/services/auth.service.ts](../../src/services/auth.service.ts)

Tests: [src/services/auth.service.test.ts](../../src/services/auth.service.test.ts)

Last updated: 2024-01-15

The continuous loop:

  1. Specification describes desired behavior
  2. AI generates code matching specification
  3. Code includes links back to specification
  4. Specification includes links to code
  5. When code changes, specification updates
  6. When requirements change, specification updates first

Documentation for System Evolution

As your system evolves, documentation guides AI tools to make consistent changes.

Example: Adding OAuth support

Update docs/features/authentication.md:

markdown
## OAuth Implementation (Added 2024-01-20)

### New API Endpoints

#### GET /api/auth/google
Initiates Google OAuth flow.

**Response:**
Redirects to Google OAuth consent page.

#### GET /api/auth/google/callback
Handles Google OAuth callback.

**Query Parameters:**
- `code`: OAuth authorization code
- `state`: CSRF protection token

**Response:**
Redirects to application with JWT token.

### Updated Database Schema
```sql
ALTER TABLE users 
ADD COLUMN google_id VARCHAR(255),
ADD COLUMN github_id VARCHAR(255);

CREATE UNIQUE INDEX idx_users_google_id ON users(google_id);
CREATE UNIQUE INDEX idx_users_github_id ON users(github_id);
```

### Implementation Notes
- Use passport.js GoogleStrategy
- Store OAuth tokens in separate table if refresh needed
- Handle account linking (existing email + OAuth)

Then prompt AI:

markdown
"Read the updated authentication.md specification and implement 
the new OAuth endpoints as specified."

AI reads the specification and implements consistently with existing patterns because the documentation provides complete context.

Principle 2: Refactor Continuously in Isolation

The key to maintaining live systems is refactoring continuously in small, isolated chunks rather than waiting for "big refactoring sprints."

The Isolation Pattern

Core concept: Change one thing at a time, test it thoroughly, then deploy it independently of other changes.

Three isolation strategies:

1. Feature Flags for Behavior Changes

Feature flags let you deploy code that's "off" by default, then enable it incrementally:

Example implementation:

javascript
// config/features.js
export const features = {
  newCheckout: process.env.FEATURE_NEW_CHECKOUT === 'true',
  aiSuggestions: process.env.FEATURE_AI_SUGGESTIONS === 'true',
  darkMode: process.env.FEATURE_DARK_MODE === 'true'
};

// In component
import { features } from '@/config/features';

export function CheckoutPage() {
  if (features.newCheckout) {
    return <NewCheckout />;
  }
  return <OldCheckout />;
}

Implementation using environment variables:

javascript
// config/features.js
export const features = {
  newCheckout: process.env.FEATURE_NEW_CHECKOUT === 'true',
  aiSuggestions: process.env.FEATURE_AI_SUGGESTIONS === 'true',
  darkMode: process.env.FEATURE_DARK_MODE === 'true'
};

// In component
import { features } from '@/config/features';

export function CheckoutPage() {
  if (features.newCheckout) {
    return <NewCheckout />;
  }
  return <OldCheckout />;
}

When to use feature flags:

  • New features that change user-facing behavior
  • A/B testing different approaches
  • Gradual rollouts to subset of users
  • Quick kill-switch capability if issues arise

Important: Feature flags are temporary. Remove them after full rollout (typically within 2-4 weeks).

2. Branch by Abstraction for Refactoring

When refactoring core systems that can't use feature flags, use branch by abstraction:

Step 1: Create Abstraction Layer

javascript
// Before: Direct database calls everywhere
await database.query('SELECT * FROM users WHERE id = ?', [userId]);

// After: Introduce abstraction
class UserRepository {
  async findById(userId) {
    return await database.query('SELECT * FROM users WHERE id = ?', [userId]);
  }
}

Step 2: Migrate All Callers Replace direct database calls with repository methods throughout codebase.

Step 3: Refactor Behind Abstraction

javascript
class UserRepository {
  async findById(userId) {
    // New implementation using ORM or different database
    return await orm.User.findByPk(userId);
  }
}

Step 4: Remove Abstraction (Optional) If the abstraction was only for migration, consider removing it once complete.

This pattern works excellently when working with AI tools—you can prompt the AI to "refactor using the repository pattern" and point it to your specification document that describes the pattern.

3. Parallel Component Development

For significant changes, build the new version alongside the old:

diagram
app/
├── components/
│   ├── checkout/          # Old version (still in use)
│   │   ├── CheckoutForm.js
│   │   └── PaymentStep.js
│   └── checkout-v2/       # New version (being developed)
│       ├── CheckoutForm.js
│       └── PaymentStep.js

Route traffic based on feature flag or user segment. Both versions live in production simultaneously.

Refactoring Decision Framework

When to refactor:

  1. Before adding related features: Refactor the area you're about to modify
  2. When you touch code 3+ times: Third time touching code, clean it up
  3. When onboarding reveals confusion: New team members struggle with area
  4. When bugs cluster: Multiple bugs in same module indicate design issue
  5. When tests are painful to write: Hard-to-test code needs refactoring

Refactoring size guidelines:

  • Small (1-3 hours): Rename variables, extract functions, simplify conditionals
  • Medium (1-2 days): Restructure module, introduce abstraction, extract component
  • Large (1 week): Migrate to new pattern, replace core dependency, architectural change

Important: Large refactorings should be broken into smaller changes deployed incrementally.

Testing Refactorings in Isolation

The golden rule: Write tests before refactoring, keep tests passing during refactoring.

Test strategy:

  1. Characterization tests: Document current behavior before changing it
  2. Regression tests: Ensure refactored code behaves identically
  3. Integration tests: Verify refactored component works with rest of system
  4. Production monitoring: Watch metrics after deploying refactored code

Example test pattern:

javascript
describe('User Authentication', () => {
  it('maintains existing behavior after refactoring', async () => {
    // Arrange: Set up test user
    const user = await createTestUser();
    
    // Act: Login with both old and new system
    const resultOld = await authService.login(user.email, user.password);
    const resultNew = await authServiceV2.login(user.email, user.password);
    
    // Assert: Both produce identical results
    expect(resultNew).toEqual(resultOld);
  });
});

Principle 3: Zero-Downtime Deployment Strategies

Users expect systems to be available 24/7. Modern deployment strategies make this possible.

Deployment Pattern Comparison

Blue-Green Deployment

Concept: Maintain two identical environments. Switch traffic instantly between them.

Implementation steps:

  1. Blue environment: Current production version serving all traffic
  2. Green environment: Deploy new version, run tests, keep idle
  3. Validation: Verify green environment works correctly (smoke tests)
  4. Switch: Update load balancer to route to green environment
  5. Monitor: Watch metrics closely for 15-30 minutes
  6. Rollback or commit: Either switch back to blue, or decommission blue

Example configuration (conceptual):

yaml
load_balancer:
  active_environment: green
  blue:
    servers:
      - blue-server-1
      - blue-server-2
    version: v1.4.5
  green:
    servers:
      - green-server-1
      - green-server-2
    version: v1.5.0

Key benefit: Instant rollback. If issues arise, switching back takes seconds.

Working with AI tools for blue-green: Your Markdown documentation should specify:

markdown
# Deployment: Blue-Green Strategy

## Infrastructure
- Blue environment: production-blue.example.com
- Green environment: production-green.example.com  
- Load balancer: Routes traffic based on DNS

## Deployment Process
1. Deploy new version to idle environment (green)
2. Run smoke tests against green environment
3. Switch DNS to point to green environment
4. Monitor for 30 minutes
5. If issues: Switch DNS back to blue
6. If stable: Decommission blue environment

## Verification Steps
- [ ] All API endpoints return 200
- [ ] Database connections successful
- [ ] Critical user flows complete
- [ ] No errors in monitoring

Then AI tools can help you script the deployment process based on this specification.

Canary Deployment

Concept: Release new version to small percentage of users, gradually increase.

Typical rollout schedule:

  • Day 1: 1-5% of users (canary cohort)
  • Day 2: If no issues, increase to 25%
  • Day 3: Increase to 50%
  • Day 4: Increase to 100%

Implementation with feature flags:

javascript
// Determine if user gets new version
function shouldUseNewVersion(user) {
  const canaryPercentage = getCanaryPercentage('checkout-v2');
  const userHash = hashUserId(user.id);
  return (userHash % 100) < canaryPercentage;
}

// In application code
if (shouldUseNewVersion(currentUser)) {
  return <CheckoutV2 />;
} else {
  return <CheckoutV1 />;
}

Monitoring during canary:

  • Compare error rates between canary and control groups
  • Monitor performance metrics (latency, completion rates)
  • Watch for user complaints or support tickets
  • Track conversion rates or business metrics

Decision criteria:

  • Proceed: Canary metrics match or exceed control group
  • Hold: Minor issues, investigate before expanding
  • Rollback: Significant degradation, disable for canary users

AI-assisted canary documentation:

markdown
# Canary Deployment: New Recommendation Engine

## Rollout Schedule
- Day 1 (2024-01-20): 5% of users
- Day 2 (2024-01-21): 25% of users  
- Day 3 (2024-01-22): 50% of users
- Day 4 (2024-01-23): 100% rollout

## Success Metrics
- Click-through rate >= baseline (15%)
- API response time <= 200ms (p95)
- Error rate <= 0.5%
- User satisfaction score >= 4.2/5

## Monitoring Queries
```sql
-- Compare error rates
SELECT 
  version,
  COUNT(*) as requests,
  SUM(CASE WHEN status >= 500 THEN 1 ELSE 0 END) as errors,
  (SUM(CASE WHEN status >= 500 THEN 1 ELSE 0 END) * 100.0 / COUNT(*)) as error_rate
FROM api_logs
WHERE created_at > NOW() - INTERVAL '1 hour'
GROUP BY version;
```

## Rollback Procedure
1. Set CANARY_PERCENTAGE=0 in environment
2. Restart application servers
3. Verify all traffic routes to stable version
4. Investigate canary issues before retry

AI tools can generate monitoring dashboards and alerting logic based on this specification.

Rolling Deployment

Concept: Gradually replace instances with new version, one at a time.

Process:

  1. Take one server out of load balancer pool
  2. Deploy new version to that server
  3. Run health checks to verify it's working
  4. Add server back to pool
  5. Repeat for next server

Advantages:

  • No additional infrastructure needed
  • Smoother than instant switch
  • Can pause at any point

Disadvantages:

  • Slower rollback (must roll forward or backward progressively)
  • Multiple versions running simultaneously
  • Requires backward-compatible changes

Feature Flag Safety Patterns

Feature flags enable safe production changes but require discipline:

Best practices:

  1. Limit flag scope: Isolate changes to smallest possible code area
  2. Short-lived flags: Remove within 2-4 weeks of full rollout
  3. Clear ownership: Each flag has owner responsible for removal
  4. Test all states: Automated tests cover flag on/off scenarios
  5. Fallback values: Define safe defaults if flag system fails

Anti-patterns to avoid:

  • ❌ Permanent flags that never get removed (creates complexity)
  • ❌ Nested flags (if flag A is on, check flag B, check flag C...)
  • ❌ Testing all combinations (exponential complexity)
  • ❌ Using flags for refactoring (use branch-by-abstraction instead)

Flag hygiene checklist:

markdown
# Feature Flag: new_dashboard_layout

- [ ] Flag created with clear purpose and timeline
- [ ] Tests written for both enabled and disabled states
- [ ] Monitoring configured for key metrics
- [ ] Rollout plan documented (1% → 10% → 50% → 100%)
- [ ] Removal date set (2 weeks after 100% rollout)
- [ ] Owner assigned for flag lifecycle

Ready to scope something we can stand behind long-term?

Start with a Scope Pack. If it’s a fit, we’ll build — and remain accountable as it evolves.