Small: MiniSearch / Lunr

For a catalogue of hundreds to low thousands, an in-browser search index works fine:

import MiniSearch from 'minisearch';
import fs from 'node:fs/promises';
import { glob } from 'glob';

const files = await glob('screenplays/*.json');
const docs = await Promise.all(files.map(async (f) => {
  const d = JSON.parse(await fs.readFile(f, 'utf8'));
  return {
    id: d.id,
    title: d.title.en,
    logline: d.logline?.en,
    characters: d.characters.map((c) => c.name).join(' '),
    dialogue: d.document.scenes
      .flatMap((s) => s.body)
      .filter((el) => el.type === 'dialogue')
      .map((el) => el.text.en)
      .join(' '),
  };
}));

const mini = new MiniSearch({
  fields: ['title', 'logline', 'characters', 'dialogue'],
  storeFields: ['title', 'logline'],
});
mini.addAll(docs);

await fs.writeFile('index.json', JSON.stringify(mini.toJSON()));

Mid: Meilisearch / Typesense

For low-latency faceted search at catalogue scale:

import { MeiliSearch } from 'meilisearch';

const client = new MeiliSearch({ host: 'http://localhost:7700' });
const index = client.index('screenplays');

await index.updateSearchableAttributes(['title', 'logline', 'characters', 'dialogue']);
await index.updateFilterableAttributes(['genre', 'themes', 'heading_contexts', 'heading_times']);

await index.addDocuments(docs);

Big: Elasticsearch

See How-to: Store ScreenJSON in Elasticsearch.

Semantic: vector DB

For “find me scenes that feel like this”: pair a full-text index with a vector DB populated from generated embeddings. Hybrid retrieval beats either alone.

Build a screenplay search index

Small: MiniSearch / Lunr

Mid: Meilisearch / Typesense

Big: Elasticsearch

Semantic: vector DB

Next