Create Model Cards and System Cards: AI Documentation Templates

TL;DR: Model cards document individual models—their capabilities, limitations, and intended use. System cards document how models are deployed in production systems. Together, they form the documentation foundation for responsible AI.

When something goes wrong with an AI system, the first question is: "What were you thinking?" The second question is: "What was this thing supposed to do?"

Model cards and system cards answer these questions.

Pro tip: The EU AI Act requires "technical documentation" that maps directly to model card and system card content. Start with these formats now and you're building compliance documentation.

Model Cards: What the Model Does

A model card is documentation for a machine learning model. Think of it as a nutrition label for AI—what's in it, what it's for, and what to watch out for.

Essential Sections

Model Details

• Name and version
• Architecture type
• Training date
• Developers/owners

Intended Use

• Primary use cases
• Intended users
• Out-of-scope uses
• Known limitations

Training Data

• Data sources
• Data preprocessing
• Data demographics
• Known gaps

Performance

• Metrics and benchmarks
• Performance across groups
• Failure modes
• Confidence calibration

The Disaggregated Metrics Principle

A model that's "95% accurate overall" might be:

98% accurate for majority populations
75% accurate for minority populations

Model cards should report performance disaggregated by:

Demographics (age, gender, race, location)
Use case variations
Input quality levels
Edge cases

System Cards: How the Model Is Used

A system card documents the production deployment—not just the model, but everything around it.

System Card vs Model Card

Model Card

"This model classifies images of skin conditions with 92% accuracy on dermatology benchmarks."

System Card

"This system uses the model to triage dermatology referrals. It requires human confirmation for all malignancy predictions and routes uncertain cases to specialists."

System Card Components

flowchart TB
    subgraph SYSTEM["System Card"]
        SC1[System Purpose]
        SC2[Model Integration]
        SC3[Human Oversight]
        SC4[Operational Constraints]
        SC5[Monitoring & Alerts]
        SC6[Incident Response]
    end

System Purpose: What business problem does this system solve?
Model Integration: How is the model used within the larger system?
Human Oversight: What human review processes are in place?
Operational Constraints: Thresholds, rate limits, fallbacks
Monitoring: What's tracked? What triggers alerts?
Incident Response: What happens when things go wrong?

The Gap Between Cards

The most dangerous AI systems are those where:

The model card says "not for medical diagnosis"
The system card (if it exists) describes a medical diagnosis system

This gap between intended use and actual use is where harm happens. System cards make this gap visible.

Who Writes These?

Role	Model Card	System Card
ML Engineers	Primary author	Technical contributor
Product Managers	Use case sections	Primary author
Legal/Compliance	Reviewer	Reviewer
Ethics Review Board	Reviewer	Reviewer

Living Documents

Both model cards and system cards should be:

Version controlled: Tracked in git alongside code
Updated: Refreshed when models or systems change
Accessible: Available to anyone who needs them
Linked: Model cards referenced by system cards

They're not write-once documents. They evolve with the system.

Regulatory Requirements

The EU AI Act explicitly requires documentation similar to model cards and system cards for high-risk AI systems. The terminology differs, but the requirements overlap:

Technical documentation (Article 11)
Instructions for use (Article 13)
Record-keeping (Article 12)

Organizations that implement model cards and system cards now will be ahead of regulatory requirements.

Warning: The most dangerous AI systems are those where the model card says "not for medical diagnosis" but the system card describes a medical diagnosis system. This gap is where harm happens.

Key Takeaway

Model cards answer "What can this model do?" System cards answer "How are we using it?" Together, they create accountability by making the gap between capability and deployment visible. Start with your highest-risk systems and work outward.

Empress links model cards to operational data automatically. Your static documentation stays connected to live system behavior—making audits, incident response, and compliance verification straightforward.