Remember mail merge? According to Wikipedia mail merge dates from circa 1980 and the WordStar word processor. The new term for mail merge, seems to be document assembly, or document generation or even document automation.

"This feature is usually employed in a word processing document which contains fixed text (which is the same in each output document) and variables (which act as placeholders that are replaced by text from the data source). Some word processors can insert content from a database, spreadsheet, or table into text documents." (Wikipedia)

Unfortunately this remains an accurate description of how most document assembly (aka document generation) tools work, to this day. In this article I hope to convince you that you can do SO MUCH BETTER, using modern, open technology from Accord Project (a Linux Foundation project).

TL;DR Comparison Table

Doc Assembly

Computable Documents

ModularityLimitedPublish and reuse templates privately and to https://templates.accordproject.org
Data modelsLimited to basic typesDefine custom complex types using Concerto. Import types from https://models.accordproject.org
FormulasLimitedPowerful formulas based on the Ergo domain specific language
StandardsNoneOpen source, and open community, under Accord Project (Linux Foundation)
Text generationYesYes, including PDF, HTML, MS Word, markdown
Locale and formattingLimitedClear separation between data models (locale neutral) and text for templates, including type formatting
Extract data from textNoneParsing supported for all templates, allowing data to be extracted from text.
TranslationNoneParse (locale A) followed by Draft (locale B) can be used to translate the text of a document
Logic assemblyNoneTemplates may include backing logic
Template and document parsingLimited and/or proprietaryDocument object models are provided for both documents (CiceroMark) and templates (TemplateMark) allowing automated tools to generate or process documents and templates
Run anywhereNoOpen source code means documents can be assembled on everything from a webpage to a mainframe!

Document Assembly

The process of document assembly involves combining data (variable values) and a document template (containing variable references) to produce a document.

Data

name: "Dan"
amount: 400
date: "1st September, 2020"

Text

I {{name}} hereby agree to pay {{amount}} GBP by {{date}}.

Document (Result)

I Dan hereby agree to pay 400 GBP by 1st September, 2020.

Although better than nothing, document assembly has limitations that severly limit its usefulness, particularly at enterprise-scale. These include:

  1. Vendor lock-in: the market is dominated by proprietary vendor products, and all the time and effort taken to digitize documents and build templates locks enterprises into using a single vendor's engine.
  2. No semantics: the document assembly engine typically has no semantic knowledge about the sections of the document, or the variables. The engine blindly replaces variables with their values in the document. In the simplest incarnations the template is a MS Word document and the variables are marked-up using [[variable]] tags, or similar.
  3. No composition: often it is not possible to assembly documents from modular, composable components.
  4. Limited type system: the types of variables are typically very restrictive, and sometimes all variables are simply treated as text subsititions, rather than treated as semantic concepts, such as addresses, numbers, dates, times, durations, monetary amounts, party names, signatories, references to product SKUs etc. etc.
  5. Requires custom UI, such as forms, to define variable values: the engine may be able to pull data in from external sources (spreadsheets, CRM etc), but forms are used to manually enter the values for variables. Often these forms have to be designed or customized by-hand, and have to be kept in-sync with the variables used in the document templates.
  6. No separation between concepts and how they are represented for a given locale: for example, a document generated for US English should use MM/DD/YYYY date formats, while one generated for UK English should use DD/MM/YYYY, or even that address and monetary amount formats are widely different, based on locale.
  7. No ability to extract data from existing documents using a single set of templates: the document assembly pipeline is strictly data + document template document.
  8. Very limited ability to include formulas and calculations.

Computable Documents

Computable documents are semantically richer and much more powerful than basic document assembly, as they support two foundational operations:

  1. Draft: similar to document assembly, in that data and templatized text is combined to produce a document. Data + document template document.
  2. Parse: templatized text is used to extract data from existing documents. Document + document template data.

Drafting is used to produce text, and its inverse is parsing, which is used to re-extract data from a document. Both drafting and parsing are parameterized using the same Accord Project template.

In mathematical terms, parse and draft are isomorphisms, in that one is the inverse of the other. An Accord Project template defines a bi-directional mapping between semantic concepts and natural language text.

Drafting

Data + document template document.

Computable documents extend basic document assembly:

  • With a rich and extensible data model: new types can be defined, and how they are mapped to natural language text is specified
  • Drafting is locale aware, and templates can specify natural language text for each supported locale, including how numbers, dates and monetary amounts are formatted
  • While drafting assembles text, the unit of composition is a template, and a template contains: natural language, a data model AND LOGIC. So, the output of drafting is an instantiated template (aka a clause), which may be backed by computable logic.
  • Templates may include inline formulas, specified using the Ergo domain-specific language, giving you an MS Excel like language to define complex calculations whose results should appear in the assembled text
  • Internet scale composition of models is supported via a system of imports using URI/URL, allowing models and templates to be distributed publicly or privately using standard web technologies
  • Automatically generated web-forms may be used to edit variable values, irrespective of whether they are primitive or used-defined complex types.
  • Generate HTML, PDF, MS Word, or formatted markdown text, with other formats easily supported via extensions to the open source code base.

In the example below a new template is created for a simple document that contains a payment clause, and then the variables in the payment clause are defined, and how they are formatted.

Data Model

The data model for a template defines the name and type of the concepts used in a template. The data model is locale neutral and is specified using the Open Source Concerto schema language. Note that composition from Contract > Payment Clause is namespaced, type-aware and carries lots of semantics, in that in this template we are only expecting a single payment clause per contract.

namespace com.example

concept PaymentClause {
   o String name
   o MonetaryAmount amount
   o DateTime date
}

concept Contract {
   o PaymentClause paymentClause
}

Text

The text for a template defines the natural language (for a given locale). It bind the concepts to natural language and specifies how they are formatted. The text for a template is defined using an extended markdown format, allowing standardized rich text formatting, while remaining easy to process using with a wide variety of open source tools.

This is some text.

{{clause paymentClause}}
I {{name}} hereby agree to pay {{amount as "K0,0.00}} by {{date as "DD MMM, YYYY"}}.
{{#clause}}

This is more text.

Data

Data is specified using a JSON data representation, and can be validated against the Concerto data model.

{
   $class: "com.example.Contract",
   paymentClause: {
      name: "Dan"
      amount: {
         doubleValue: 400,
         currencyCode: "GBP"
      }
      date: "2020-09-01"
   }
}

Document (Result)

The text is combined with the data to produce the drafting result. Note that monetary am0unt and date formatting may be applied, consistent with the template, while preserving the semantics of dates and monetary amounts within the data.

This is some text.

I Dan hereby agree to pay £400.00 by 01 Sept, 2020.

This is more text.

Parsing

Document + document template data

The ability to parse documents and to extract data from them (using the same template used for drafting) is a transformative capability, that opens up lots of powerful new scenarios:

  • When a document is received from a counterparty, parse it to extract data from it
  • Ensures that the serialization of data for a document remains human reabable, and is not opaque binary blobs, or JSON data structures, disconnected from the generated text
  • Foundational for many text based tools, such as semantic and full text search, diff/merge, editing via GitHub integration via pull requests etc
  • Places humans at the center of data exchange, allowing them to send each other documents via email, chat etc and then for the documents to be parsed to extract contract or document data
  • Parsing is safe and complete (it either succeeds or fails), not approximate
  • Batch ingestion and processing of documents
  • Foundation for fuzzy parsing
  • Automated translation by parsing in one locale, and then re-drafting into the target locale

Data Model

Note that the data model is identical to the one used for parse.
namespace com.example

concept PaymentClause {
   o String name
   o MonetaryAmount amount
   o DateTime date
}

concept Contract {
   o PaymentClause paymentClause
}

Text

Note that the text is identical to the one used for parse.
This is some text.

{{clause paymentClause}}
I {{name}} hereby agree to pay {{amount as "K0,0.00}} by {{date as "DD MMM, YYYY"}}.
{{#clause}}

This is more text.

Document

Document text is provided to the parse operation.

This is some text.

I Dan hereby agree to pay £400.00 by 01 Sept, 2020.

This is more text.

Data (Result)

The result of parse is a JSON data structure which can then be round-tripped back to doucument text by calling draft!

{
   $class: "com.example.Contract",
   paymentClause: {
      name: "Dan"
      amount: {
         doubleValue: 400,
         currencyCode: "GBP"
      }
      date: "2020-09-01"
   }
}

Summary

In this article we've just scratched the surface of the capabilities of Accord Project templates, but I hope this has inspired you to look at this Open Source, Open Community technology as the foundation for your document assembly (draft and parse!) tools and solutions.

To learn more about creating templates visit https://accordproject.org and upload your templates to your Clause account today.