Open Source Document Assembly

On a recent pitch, I was asked what value-add GhostFill offered over open-source Linux based document assembly tools.  The answer was, “What tools?” There are no open-source document assembly development projects.  Complex rule-driven text manipulation is a mix of “content-management” and “programming”.  Content management deals with Word, RTF, Text, HTML, and PDF formats, traditionally handled on a Windows platform with Windows tools/

There is no inherent monopoly of Windows for document assembly.  However, recognizing the small size of the market, there has been no ground swell of open source developers.  The power of document automation tools is not in “mass printing”, but rather in delivering a fully customized experience to the user inputs.  A system should evolve over time.  The programming tool needs to be simple enough for content managers to understand, yet powerful enough to produce documents that are “ready to print” or “ready to file”.  It is this grey area between “mass production” and “custom production” where the document assembly engines excel.  However, because the “market” as opposed to the potential is so small, only those vendors with “other reasons” for entering this space have jumped in.

XML Based Document Assembly – D3 and IPManager

Office13, aka Office 2006, now in final beta, introduces a new format WordML, a Microsoft variant of XML that includes XML objects and proprietary Word formatting extensions.  XML has the potential to revolutionize document assembly, allowing for the creation of dynamic editable templates.  D3 and Perfectus make extensive use of XML encoding in Office 2003.  They present a case study in diametrically opposite approaches to the design of document automation software.

D3 from Microsystems and Perfectus IP Manager from Perfectus Solutions approach document assembly from opposite ends of the spectrum.

IP Manager has a sophisticated application builder with controls for laying out web-based interviews in a powerful GUI while XML tagging a family of templates.  It allows the family of templates to be “packaged” and shipped.  The system is modular with reusable components and definitions.  From a hosted server, multiple offices can access the document sets.  The system can be blended into a DocsOpen or WebDocs solutions with full workflow, or can manage its own resulting documents and answer sets.

D3, by contrast, merges SQL server database with Word objects that are “parsed” and uploaded into the database.  D3 is a powerful clause manager and document modeller.  It allows the users to pick and choose among clauses, and merge the clauses into a finished and styled product.  While a “document form” can be “imported” from another system, the best way to work with D3 is on-site.  When it comes to variables, the interview is relegated to a Side-bar with one variable displayed at a time.

The key difference really is the target.  D3 is designed for the “trusted user” who is likely an attorney, giving him or her access within a few strokes to “data sources” in their network (Outlook, Billing, Client Lists etc.) and to a comprehensive and structured clause bank.  The user chooses the “objects” and brings them into the document, then runs a filler which looks for unanswered variable fields.  Because the fields, generally with a 1 to 1 mapping (no collections), are XML encoded, the trusted user can change the “document” and then have the fields refresh based on changed data.

IP Manager is targeted towards the “untrusted user”.  Like other document assembly systems, HotDocs, GhostFill, and to an extent DealBuilder, the “templates” are locked in a developer controlled application.  The developer – knowledge worker has determined what provisions are appropriate based on the appropriate response to a series of questionst, grouped into pages or dialogs in a structured interview.  The focus is on the questions, not the words of the documents.  The “untrusted user” is typically found in a corporate environment where a corporate counsel is deploying standard forms across a disparate sales force.

Basha Systems Document Assembly Blog Ready for Syndication and HotDocs 2006

We have been busy restructuring all our website in the Basha Systems family:, our client new client portal,, our main consulting site, our Fogbugz project tracking site,, our store.  There are two other sites almost ready to go live to support our Nebraska Probate System V and a new system for Building Inspectors. With these items out of the way, I am ready to return to commenting on developments in Document Assembly.  In particular, HotDocs 2006 is about to go into Full Beta. I have gathered posted by chief architect of HotDocs Marshall Morrise to the public HotDocs list.  When the product comes out of beta and is released, come back for a discussion of these innovative new features.

There has been discussion on the HotDocs List of several potential features in the HotDocs 2005 Beta.  I will confine my comments to posts from Marshall Morrise, the chief architect of HotDocs to the HotDocs ListServer.  These posts reveal some major new features under the hood that will catapault HotDocs to the lead as a full fledged document assembly development platform.

February 9, 2006: Feature Set

All new features we have discussed on the list over the past several months (including the one referred to below) are being implemented in HotDocs 2006. The one exception is support for WordPerfect X3, which has been added to HotDocs 2005 SP3 (available soon). The reason we put this into a 2005 version is that WordPerfect X3 is already out and we don’t feel we can expect customers who upgrade to it to wait until HotDocs 2006 comes out in June.

Feb. 9, 2006: Access macros from an non-loaded template

The discussion regarding macros stored in a secondary template had to do with HotDocs 2006. We have implemented the feature we discussed. A new component file property allows you to specify a Word .dot file that contains PLAY macros. The .dot file must be in the same folder as the template that includes the PLAY instruction. Thus, if you distribute a template set where one or more of the templates includes a PLAY instruction, instead of requiring users to copy the PLAY macros into, or putting them in a template that must be copied to the Word Startup folder, you can simply include the .dot with the template set.

January 20, 2006: Pretty Templates

In this example, variable titles have been substituted for variable names and square brackets have been substituted for IF/END IF pairs. Color would be used to identify variables and the brackets for IF instructions (which I can’t show here because the list server just strips color out). Footnotes would be used to show the actual IF logic. Other things would be done as well.

December 13, 2005: Required Fields

Because the HotDocs 2005 Server interface mimics the desktop interface, it does not include asterisks for required fields. Instead, the fact that a required field is not answered is indicated by a red asterisk in the interview outline. That said, requests for the asterisks have been frequent enough that both the desktop and server interfaces for HotDocs 2006 will include a user option to show red asterisks at the left side of prompts for required fields.

November 23, 2005: Update Table of Contents and Indexes in Word

In HotDocs 2006 we’ve added a component file option titled “Update table of contents and index after assembly”. If you turn this on, HotDocs will automatically update the TOC and indexes so you don’t have to PLAY a macro to do it.

November 17, 2005: Inserting an OR in a document based on rules

At the moment, the only way to accomplish what you are asking about is to write some computations that figure out which lines get included in the document. Someone on the list can probably help you out with an approach. For everyone’s information, a feature that will make this very easy (no computations involved) will be included in HotDocs 2006.

November 2, 2005: Master Component File – But Independent Document Interviews

Just so you know, HotDocs 2006 will allow you to specify a different interview component for each template that points to a shared component file. While this doesn’t address all of the issues raised regarding using shared component files, it does address the issue Bart mentions immediately below.

October 19, 2005: Unique Values in a Multiple Choice Variable

I have received a request to produce a function that will filter out unique responses from a repeated multiple choice variable.  To get this list of unique selections across the repeated set of answers, I have to do quite a bit of scripting, including setting a counter, copying over options from the Favorite Desserts variable to a temporary collection variable, using a WHILE to look through the collection variable to make sure I don’t duplicate a choice, etc.

October 6, 2005: Text Manipulation Functions

We have discussed the addition of some built-in text manipulation functions to HotDocs to make it easier to manipulate text answers, particularly multi-line answers. There have been a number of suggestions, including “normalizing” multiline text to make it single-line, stripping off white space, etc.

After reviewing the emails and suggestions offered, we proposed to create the following built-in text manipulation functions:

STRIP(, , , )

text variable = any text variable
characters = characters to be stripped off
frombeginning = TRUE if characters to be stripped from the beginning of the text
fromend = TRUE if characters to be stripped from the end of the text

Characters to be stripped off can include the following “pseudo-codes”:

^t for a tab
^s for a hard space
^p for a paragraph mark/hard return
^w for any white space

etc. (rather like what you can do with Find and Replace in Word)

REPLACE(, , , )

text variable = any text variable
searchfor = character string
replacewith = character string
all = TRUE to replace all occurrences of with instead of just the first one

Searchfor and replacewith can include the same pseudo-codes shown for STRIP above.


Yields the answer plus a space if the text variable is answered or an “answered nothing” otherwise.

Examples of use:

SET MultiLineAddress TO STRIP(MultiLineAddress, “^w”, TRUE, TRUE)

Strips white space from the beginning and end of a multi-line address.

SET OneLineAddress TO REPLACE(MultiLineAddress, “^p”, “, “)

Replaces returns in the multi-line address with a comma and a space to form a single line address.

With SPACE, instead of a computation containing:

ClientFirstName + “ “
IF ANSWERED(ClientMiddleName)
RESULT + ClientMiddleName + “ “
RESULT + ClientLastName

you could use:

ClientFirstName + “ “ + SPACE(ClientMiddleName) + ClientLastName

October 6, 2005: Bold, Italics and Underline in Computations and List Punctuation


1. Quite a number of developers have requested a way to identify bold, underscore, and italics in text produced by computations for insertion into a document.

2. A similar number of developers have requested a way to get bold, underscore, and italics into additional text or prompt text in a dialog.

3. A smaller, but still meaningful number of developers have requested a way to get “a, b, and c” style punctuation into a document without using a repeat. For example, if I want to list a client, the client’s spouse, and their children as “My family consists of myself, my spouse Sam Jones, my son Tim Jones, and my daughter Sue Jones”, where the children come from a repeated dialog but the parents do not, I have to do some tricky scripting.

Proposed Solution

We propose to implement a new kind of field that can be inserted into text. The field will contain something I’ll call a “dot command”. Here are some
examples (using single angle brackets in place of chevrons):

In a computation script:

SET Variable TO “Please be <.b>very<.eb> careful when moving the cannister.”

The “.b” and “.eb” commands represent bold and end bold respectively. When this text is merged into a document, or when it is displayed in a dialog, the visible dot commands will be replaced by actual bolding of the word “very”.

Similar commands would be implemented for italics (.i) and underscore (.u).

It might be good if we used longer words for the dot commands, like “.begin bold” and “.end bold”. These would be easier to recognize in the text. The downside is, they’re long.

It might be good if we were to use some other character to “introduce” the new commands. We need to use something that has never been allowed as a valid character to being a HotDocs component name. One suggested character has been the backslash. We (I) like periods because they’re fairly unobtrusive.

QUESTION 1: How do you feel about this scheme of allowing visible fields in text that will be translated into the “real thing” in documents or dialogs? If you don’t care for it, do you have other suggestions?

QUESTION 2: Do periods (dots) work for you, or do you think some other character would be better?

QUESTION 3: Are you OK with short, mnemonics (like “.b” and “.be”) or do you prefer longer commands (like “.begin bold” and “.end bold”)?

In addition to dot commands for specifying bold, underscore, and italics, we’ve considered commands like:

<.an> inserts “a” or “an” depending on the word that follows
<.> inserts a period, but only if no punctuation precedes the dot command (useful when inserting a sentence typed by the user as an answer)
<.lq> inserts a curly left quote
<.rq> inserts a curly right quote

There are more things we’ve thought of (curly apostrophes, other conditional punctuation, etc.)

QUESTION 4: Are there other commands of this sort you’d like to see implemented? If so, please describe them.

As the solution to non-repeated list punctuation, we propose to implement:

<.p “format”> identifies the beginning of a punctuated list and gives the format
<.p> identifies the spot where punctuation characters should be inserted
<.pe> identifies the end of the punctuated list

Using the family example described above, you could have something like:

<.p “a, b, and c”>My family consists of myself<.p>my spouse <.p>my Name><.p><.pe>.

which would be assembled as

My family consists of myself, my spouse Sam Jones, my son Tim Jones, and my daughter Sue Jones.

QUESTION: Will this be useful to you?

I should mention that we expect to do two things:

1. Create an interface for inserting dot commands (so you don’t have to remember them).
2. Make it possible to hide dot commands so users who preview your templates won’t see them.

Basha Systems Client Portal Launched

At Basha Systems we have recently launched a new and improved client portal.  The system is built in SubDreamer and phpBB2.  We provide a private space for client announcement and downloads. On the client home page we give an overview of recent postings in a private threaded discussion forum.

Check out the portal at  Login as “clienttest” with the password “clienttest” and give it a test drive.  We set up the portal for all client projects that exceed $15,000.

The New Paradigm in Client Communications

The typical client communication has degenerated into a tsunami of e-mails.  When there are only two players on a project, email communications are managable.  If the users have implemented filters on their email, or use a case management product like Time Matters then you can build a thread of e-mails on a topic.  However, once multiple developers and a team of clients join the project, e-mail threads get quickly out of control.  People get copied on emails that should not be.  Emails grow in length as they get replied to and forwarded.

By contrast, a Discussion Forum has Topics.  An inital post can have a range of “replies”.  The team can “vote” on topics … respond to a poll.  The whole forum is searchable with an intelligent search engine.  Topics can be “closed” and moved to a closed issues forum.  When combined with a publishing engine like SubDreamer, the topics can lead to “announcements” which are made to a selected group based on login.  If there is a file that needs to be downloaded, it can be posted to the server and be accessible without fancy FTP (File Transfer Protocol) software via a download manager.

Document Assembly and Information Portals

Basha Systems continues it exploration of finding the most efficient technology to provide information on document assembly and its potential as a “disruptive technology” that created “profits” for the practice of law. Part of this exploration has been the porting of the this blog to Expression Engine. Other examples can be seen on the main web site with the introduction of Video Tours of our system. See Document Assembly Video Tours. Another element is the use of integrated web publishing and threaded discussion technology which will soon be available as a private Client Portal on Bashasys.Ne

Related Link: A sample link to our Client Support Portal

What is the promise of Web portals and why should anyone who cares about document assembly care?

Web portals no longer cost thousands of dollars.  There are open source web-portals, and there are low priced software systems which contain dozens of customizable skins.  While these “portals” are not designed for an “attorney” to set up WYSWIG … They do present a powerful, low cost alternative that is within the reach of most law firms.  And … the benefit is that since portals are driven by databases … and not HTML coded pages, it is incredibly each to add new articles, or manage threaded discussions without ANY knowledge of HTML, PHP, MySQL and the plethora of intimidating web acronyms.

The reason you should care is simple.  Many clients start out with a “google search” before they go to an attorney.  It is not that they use the Google search to choose the attorney, but they use it to inform themselves of the law in a particular area so that they can more effectively use the attorney’s “expensive” time.  That is because attorneys, for better are worse, are views as “expensive” … and the less time you spend with an attorney, the better.  “Some of my best friends are attorneys … and I still maintain my own bar rap.”

How to WebPortal Increase Visibility

Web portals, particular ones like ExpressionEngine and SubDreamer are entirely database driven.  The entire website consists of a series of cascading templates and style sheets which pull information from a database of posted articles.  This means, you have an instant, current, easily modifiable, completely searchable and indexed set of articles, organized by categories and subcategories, that can demonstrate the expertise of your firm.  Most webportals, include a guestbook that can be used to “track leads” on new potential business.  And, for clients, there is support for “member groups” so that you can publish articles specific to a client, based on the client’s login.

Most up to the Bigleague with a low budget

This is not to denigrate Sharepoint (from Microsoft) or some of the other bigger web portals.  There is a place for them.  But the starting price for those systems is easilly in the tens of thousands of dollars before you see a clear benefit.  By contrast, you can get a hosted web site ($50/month), a license to SubDreamer ($125) and engage a webdeveloper who can “skin” these sites for two days ($3,000) and you have a complete publishing platform and client support system.  You can replace the “pesky emails” that go awry with a secure, private threaded discussion forum that only you and your client can see.  And this will be a forum that you can see from an internet connected terminal, subject to proper login.

And the Reason I care

We are not in the web development system at Basha Systems, but we have invested hundreds of hours in learning these tools and applying document assembly object oriented programming principals to their development.  We use these tools to communicate effectively with our clients.  Our hope is that with these tools, our clients will be able to generate more business, and more revenue which can be reinvested into building profitable document assembly systems with HotDocs, GhostFill and Dealbuilder.



Lessons from SuperBowl XL – I Can’t Get No Satisfaction


Superbowl XL was a tour de force, with the largest audience of any show all year.  It is the one football game I watch each year … “I watch for the commercials” and the “half-time” show.  This year, Mick Jagger of ther Rolling Stones performed the half-time show … without a “wardrobe malfunction”.  At the age of 62, Mick Jagger opened with the remark, that it took 40 years from when he burst onto the Rock ‘n Roll scene to make it to the mainstream of the Superbowl XL (“40″).  Dancing around a giant tongue (stuck out at full length, as in a “Brooklyn cheer”) he launched into a rousing rendition of “I can’t get no Satisfication …” and I tried, and I tried, and I tried.


It has now been over a decade since I left my comfortable “New York” litigation practice to enter the area of document assembly consulting.  In the ten years, document assembly has gone from “fringe and novel” to “mainstream”.  For many attorneys it is a central part of their process, whether it be for generating high-volume pleadings, prepare probate applications, or drafting wills and complex tax-sheltering trusts.  This article is a reminsce on the past decade in document assembly.


When I first started writing on document assembly, in the early days of the commercial internet, I was flamed as a traitor.  I took the position, which I hold now, that the billable hour is a dinosaur.  I argued that unless law firms changed their business model, they would soon find themselves priced out of the legal market by more nimble competitors.  I took the position that law firms who innovated and invested in their intellectual property, would rise to the top in terms of profitability per partner.


The response I received, to say the least, was caustic.  How could you “commoditize the law”.  Do people get legal services from “Walmart?” or “Lawyer’s ‘r Us.” The quality of legal services in this country will plummet.  Lawyers are unable to determine the “true cost” of services, and should be rewarded for the hours they labour.  Once you “fix prices”, lawyers will stop innovating and lower the quality of the services they provide.  We are serving as “advisors” to our clients – how can we put a price on that advice.  When I suggested that will value-billing and project-based billing, the effective hourly rate for service could reach into thousands of dollars per hour, I was told that lawyers who charged such fees should be brought before bar association ethics committees and disbarred.


Lucky for me, I was no longer practicing law, but rather “aiding and abbetting” my clients in the commission of these ethics crimes.  I was making it possible for my clients to “fix” the fee for a client for the delivery of a defined service.  The client would get a “cap” on the fee and a fixed expectation of the service to be rendered.  By defining the service with particularity, the law firm then had the incentive to invest in building intellectual property and systems to deliver that service with a minimum of time.  I spent much of my time, urging my clients to consider the R.O.I. on document assembly.  Most of the time, these projects were implemented only for “loss leaders” – services which were not profitable under the hourly model – and thus the investment would “save” the firm from having to write off hours.


For ten years , I got no satisfaction.  And I tried, and I tried, and I tried.  Yes, I found a number of enterprising attorneys who faced with the choice of hiring additional staff to handle a burgeoning workload, or investing in automation technology and consulting, chose the CHEAPER alternative – automation.  I would spend 30% of my time marketing and doing demos.  Ten years laters, I spend 5% of my time marketing.  These websites bring in several leads every weak … people who had already made a decision to automate.  The only question is whether the cost of our services would fit into their Return on Investment.  Many of our clients are pioneers, pushing the limits of automation.  They have played with HotDocs and Ghostfill, worked with merge fields and done simple automation.


And now they were ready to go to the next level, and build a custom application to handle a complete practice area.  Those are the inquiries we are now receiving.  Document assembly is no longer just for documents.  Mosts of our systems begin with a Master Information List to gather data and then a switchboard.  We invite you to look videos of our current systems.  Video Tours


Over the years, I have worked with a number of Document Assembly Products. For more information on these products, click on links below:

  • HotDocs
  • GhostFill
  • DealBuilder
  • Smartwords
  • MasterDraft
  • PowerTXT
  • ThinkDocs
  • WinDraft
  • Perfectus
  • qShift
  • Microsoft Word
  • Corel WordPerfect

LegalTech 2006 – Document Assembly

Another year has come and gone … LegalTech New York … The largest annual technology show.  Despite the emphasis on Litigation support systems, there were some notable participants at the conference presenting document assembly solutions.  HotDocs was there as part of LexisNexis’ Total Practice Management initiative; DealBuilder with it online document assembly system powered by a unique “relevance engine”; Perfectus Solutions with its browser-based IPManager document creation and delivery system; iXIO with its innovate online document modelling solution (Q-Shift); and Microsystems with its Word-ML basis document creation system (D3).

I met with each of the vendors.  Several of the products are ones that we support.  DealBuilder, DealBuilder, GhostFill, Time Matters and Perfectus.

We were impressed by the level of energy and innovation in the document assembly space.  This is not meant as a review of these products.  That will come later.  But rather, a recognition that there is some serious programming talent coming into developing document assembly solutions.  There are more tools than ever, and more powerful tools that ever to help firms and corporations provide document creation services.

HotDocs is working on HotDocs 2006. … Under the hood are dozens of new features for “true” application developers.  When the new HotDocs 2006 comes out we will review it.  For now, to see what can be done with HotDocs, please view the link below and take a tour of some of our videos.

MicroSystems, a new entrant in the space, brings D3.  This a cross between a knowledge management capture tool, clause picker and Word-ML based document assembler.  It doesn’t fit the classical document assembly template environment, being tied closely to Word-XML and SQL database engine.  It is very flexible in handling a number of the features typically handled by major Macro-packs like SoftWise, or numbering and metadata cleaners like those from Payne Consulting and Levitt & James and WorkShare. It strength is as a Word add-on, and clause management structure.  However, it is weak in handling complex logic and dialog scripting.  Rather than presenting dialogs, the D3 assembler presents the “document” as a living editable template, and then steps through the document, presenting questions seriatum as the user walks through the document.  These fields are stored as WordML tags which can be “reassembled”.  Viewed this way, it is more of an enhanced document builder tool, rather than an interview-driven document assembler.

DealBuilder just announced the release of DealBuilder 2.7 which brings to market more than 500 new features.  Key new features include a new web-based data reporting application, enhanced end-user experience on DealBuilder questionnaires, expanded use of mark-up within DealBuilder Master Documents, additional Administration features and a new, easy to deploy DealBuilder.Server installer.  We will be announcing shortly a major DealBuilder online system which we designed and built.  It is a world-class product with even more power.  It’s relevance engine is a major benefit for those authors who have not mastered (or choose not to master) dialog scripting.  The system does however, handle incredibly complex rule structures, and resolves them to determine and ask only those variables relevant to the current answer set in use.

Perfectus has a recently released new build.  It is has a powerful GUI for building Interviews.  It has powerful template set, work flow, and document management tools built into the product that make it a total out of the box on-line solutions. The tools are all .NET and XML and fully addressable.  There is a great GUI with drag and drop development.  Simple templates can be built rapidly.  More complex business logic can be built into the system.  The one drawback is that each unique rule has to be tagged and named.  Since it is using XML tags instead of a put text markup as GhostFill and HotDocs currently do (or as the DealBuilder author supports), the developer is limited by the way XML allows tags to be named.

iXIO’s Q-Shift is like an online version of D3.  It’s has a document parsing tool that takes a Word document and turns it into an on-line document model.  The paragraphs are turned into entries in a master clause banks that can be pulled together on the fly.  Clauses can be conditional, or required, at the designers election.  You can preview the clauses and build your document from the model.  Like D3 q-Shift lacks support for Dialogs … it presents the variables in single-variable dialog boxes as it runs through the assembly, and has limit support for complex business logic.

For additional information, please visit our document assembly videos where we showcase a number of applications of these products. Video Tours