Small: MiniSearch / Lunr
For a catalogue of hundreds to low thousands, an in-browser search index works fine:
import MiniSearch from 'minisearch';
import fs from 'node:fs/promises';
import { glob } from 'glob';
const files = await glob('screenplays/*.json');
const docs = await Promise.all(files.map(async (f) => {
const d = JSON.parse(await fs.readFile(f, 'utf8'));
return {
id: d.id,
title: d.title.en,
logline: d.logline?.en,
characters: d.characters.map((c) => c.name).join(' '),
dialogue: d.document.scenes
.flatMap((s) => s.body)
.filter((el) => el.type === 'dialogue')
.map((el) => el.text.en)
.join(' '),
};
}));
const mini = new MiniSearch({
fields: ['title', 'logline', 'characters', 'dialogue'],
storeFields: ['title', 'logline'],
});
mini.addAll(docs);
await fs.writeFile('index.json', JSON.stringify(mini.toJSON()));
Mid: Meilisearch / Typesense
For low-latency faceted search at catalogue scale:
import { MeiliSearch } from 'meilisearch';
const client = new MeiliSearch({ host: 'http://localhost:7700' });
const index = client.index('screenplays');
await index.updateSearchableAttributes(['title', 'logline', 'characters', 'dialogue']);
await index.updateFilterableAttributes(['genre', 'themes', 'heading_contexts', 'heading_times']);
await index.addDocuments(docs);
Big: Elasticsearch
See How-to: Store ScreenJSON in Elasticsearch.
Semantic: vector DB
For “find me scenes that feel like this”: pair a full-text index with a vector DB populated from generated embeddings. Hybrid retrieval beats either alone.