Astro content collections: the deep-dive I needed before shipping 46 sites

Schemas, slugs, getStaticPaths and the four mistakes that cost me a weekend. The deep-dive I needed before shipping content collections across 46 sites.

Content collections are the best thing Astro has shipped. They make every MDX file a typed, validated, queryable record. After moving 46 sites onto them, I have strong opinions about the patterns that work — and the four mistakes that bit me.

This is the deep-dive I wish existed when I started.

What collections actually solve

Before collections, every Astro site I touched had a different ad-hoc way of reading frontmatter. Some used Astro.glob(), some loaded a JSON manifest, some hand-rolled a TypeScript types file. All of them rotted. Collections kill the entire category.

The build-time validation is the unsung win. A typo in a frontmatter date field fails npm run build, not your visitors.

The Zod schema is half the system

In src/content.config.ts:

import { defineCollection, z } from 'astro:content';

const blog = defineCollection({
  type: 'content',
  schema: z.object({
    title: z.string().min(5).max(120),
    description: z.string().min(20).max(300),
    publishedAt: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
    category: z.enum(['wordpress', 'astro', 'cloudflare', 'indie-hacking']),
    tags: z.array(z.string()).max(8),
    featured: z.boolean().default(false),
  }),
});

export const collections = { blog };

Three details that aren’t obvious from the docs:

Use literal enums for categories, not free strings. The build will refuse a typoed category, and your sidebar filters can index the enum directly.
Set .default() everywhere you can. Authors will forget featured: false. Defaults turn forgotten fields into reasonable defaults instead of validation failures.
Date fields as z.string().regex(...), not z.date(). YAML date parsing is a footgun (some parsers coerce 2026-05-13 to a Date object, some leave it a string); enforce the wire format and parse explicitly in your render code.

getCollection vs getEntry — picking the right call

Collections have two read APIs:

getCollection('blog') returns every entry as an array. Use it for index pages, sorting, filtering, computing aggregates.

getEntry('blog', slug) returns one entry by slug. Use it inside getStaticPaths for the slug page — but only when you need a single entry.

For a single dynamic route, getEntry is overkill: you’ve already used getCollection inside getStaticPaths to enumerate every slug. Pass the entry as a prop:

export async function getStaticPaths() {
  const posts = await getCollection('blog');
  return posts.map((post) => ({
    params: { slug: post.id.replace(/\.mdx?$/, '') },
    props: { entry: post },  // <- pass through
  }));
}
const { entry } = Astro.props;
const { Content } = await entry.render();

That single-fetch pattern saves the build-time cost of re-fetching every entry by ID and keeps the dynamic page lean.

The slug-vs-id mistake everyone makes

Astro’s content collection id is the file path relative to the collection root, with the extension. The slug is the URL-friendly identifier you derive from it.

For src/content/blog/hello-world.mdx:

entry.id is "hello-world.mdx"
entry.slug is "hello-world"

This trips people up because in Astro 5 they unified slug access — but the id field still has the extension. If your getStaticPaths does params: { slug: post.id }, you’ll get URLs like /blog/hello-world.mdx/ and spend an hour wondering why nothing routes.

// Helper that mw.com uses everywhere.
export function entrySlug(id: string): string {
  return id.replace(/\.(mdx|md)$/, '');
}

Fold it into a shared util and never think about it again.

Glob loaders for nested directories

For /how-to/<domain>/<tool>/<slug> content where the slug is multi-segment, the route file is /how-to/[...slug].astro (spread slug). The collection picks it up automatically:

src/content/howto/
  database/mysql/not-recognized-command.mdx
  backend/php/array-functions.mdx

entry.id becomes "database/mysql/not-recognized-command.mdx". Your getStaticPaths strips the extension and passes the multi-segment string to the spread slug. URL: /how-to/database/mysql/not-recognized-command.

The four mistakes I made

Worth writing down so you don’t repeat them.

The patterns that scaled across 46 sites

The reusable bits, lifted into @empire/content-shells:

sharedFrontmatter — title, description, publishedAt, hero (object, not string), schemaType.
Per-pattern schemas — deepDiveSchema, tutorialSchema, howToSchema, snippetSchema, noteSchema. Each enforces a type literal.
A discriminated union (anyContentSchema) — for sites that want one collection to hold multiple content types, the union enforces the right field set per type.
Helper functions — byPublishedDesc, groupByCategory, uniqueTags, entrySlug, formatBlogDate. Boring, reused everywhere.

The schema is the API contract between you-the-author and you-the-templater. Get it boringly precise and the rest of the system stays calm.

— me, after the third site

Where collections are still rough

The honest list of things that aren’t great:

No incremental rebuild on schema change. Update the schema, every page rebuilds. For a 200-post blog that’s tens of seconds.
getCollection is eager. It loads every entry, then filters. Fine at hundreds of entries, painful at thousands.
Live data isn’t a collection. Anything from an API has to live outside the collection abstraction. There’s a loader API for this but it’s young.
The error messages on invalid frontmatter are accurate but cryptic. Plan to read them three times before they make sense.

None of these is a deal-breaker. All of them get better every release.

What to do tomorrow

If you’re starting a new Astro site this week:

Stand up src/content.config.ts with one collection.
Define the schema with explicit Zod validation. Use .default() aggressively.
Build the [...slug].astro route using the prop-passing pattern above.
Add a byPublishedDesc helper to src/lib/. You’ll use it five times in a week.
When you stand up a second Astro site, extract the shared schema bits into a workspace package before the third one comes online.

That’s the whole loop. Content collections will be the most enjoyable part of your Astro stack — once you stop fighting the shape.