Going Further¶
ScreenJSON by itself is merely an agnostic data interchange format for different programs and platforms to read and understand. Once an authored literary work is broken down into its programmatic elements and semantic content, it can be stored and analyzed in a multitude of meaningful ways by conventional open-source software.
NoSQL Document Stores¶
NoSQL databases such as MongoDB, CouchDB and Neptune use JSON as their default storage formats. Any ScreenJSON file can be inserted into a database as an individual document. Storing 5 million scripts as JSON document is simple, and allows each document to be programmatically queried like any other type of data.
A document-oriented database, or document store, is a computer program designed for storing, retrieving and managing document-oriented information, also known as semi-structured data. Document-oriented databases are one of the main categories of NoSQL databases.
See: https://en.wikipedia.org/wiki/Document-oriented_database
Graph Databases¶
Graph databases such as Neo4J and OrientDB use RDF to store network-style relationships between things as nodes and connections. The semantic data defined in a ScreenJSON document can easily be imported into most of these systems, for example creating searchable relationships between characters, times, locations, and story incidents.
A graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph (or edge or relationship). The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation.
Search Engines¶
Search engine software such as Apache Lucene, Apache Solr, Elasticsearch, and Sphinx are designed to analyze and catalogue an index of a library of documents which can be queried for keywords at lightning speed. Any ScreenJSON file can be inserted into a JSON-based indexer as an individual document. The separate elements of tens of thousands of screenplays can queried instantaneously, like googling a reading room.
Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.
RDF/OWL¶
The Semantic Web is an initiative to make the information generated and shared by different computer systems meaningful. Instead of sharing unnamed and unrecognisable data, the information is categorised and labelled in such a way as a computer knows what it is, not simply what it does or how it should be displayed. The data in ScreenJSON documents is easily converted into RDF XML and simple to derive hierarchial ontologies from.
The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects.
Ontological Enhancement¶
Like its predecessor, Stanford NLP, ontological engines (e.g. Apache Stanbol) can identify and analyze custom ontologies of semantic data (people, places, locations etc) in documents. The content structured inside a ScreenJSON file or record can easily be fed into its recognition engine to identify the elements of a story’s “universe”.
Apache Stanbol’s intended use is to extend traditional content management systems with semantic services. Other feasible use cases include: direct usage from web applications (e.g. for tag extraction/suggestion; or text completion in search fields), ‘smart’ content workflows or email routing based on extracted entities, topics, etc.
Analytics¶
Open-source data analytics and monitoring platorms allows you to create charts, maps, and graphically visualized representations of your data. If a database (e.g. Elasticsearch) is storing documents as ScreenJSON, the combined statistical data is easy to visualize in a browser or desktop UI.
Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture.
See: https://grafana.com/
Deep Machine Learning¶
Artificial intelligence can help to automate laborious tasks in large document datasets, while also “learning” literary and workflow patterns. Open-source automation platforms (e.g. Sagemaker, AutoML, TensorFlow) specialise in the management of complex analysis (e.g. neural networks) which can offer highly-advanced capabilities for understanding the intricacies of large amounts of content (scripts, scenes, lines etc).
TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.