Skip to content

DI Documentation

This is a second proposal, which mainly clarifies our initial proposal and addresses some concerns that were raised.

To recap our initial proposal:

  1. we are proposing a documentation solution where documentation is written in markdown and kept in git
  2. the documentation is published as static HTML at https://documentatie.di.huc.knaw.nl via a static site generator (mkdocs) designed for technical documentation. This deployments is hosted in our kubernetes production cluster.
  3. a continuous deployment pipelinek on our private Gitlab automatically deploys the site after every commit to the main branch
  4. on the documentation site, we distinguish public documentation from internal (DI-only) documentation, which will be behind OAuth2 authentication. Anything that matches path */internal/* will be considered internal. This makes it easy to use the the same repository for both internal documentation and documentation we want to share with the broader outside world.
  5. the primary source for the documentation is an internal git repository, but through the use of git submodules any git-hosted and markdown-based documentation can be pulled in. This means different documentation can live decentralised in multiple source repositories and be worked on by different teams of people, whilst still be visualisable via our documentation website. We called this flexible aggregation of documentation-islands in our initial proposal.

Scope

There are a few points we want to clarify. There is some confusion around what kind of documentation the DI-documentation site is intended for whom it is intended, as we target both internal and external readers.

The idea behind the DI-documentation is to have a simple fallback solution for documentation that all DI-ers can resort to. So if there is a need for documentation and no obvious better place to store it; then the DI-documentation site is a good place. If any projects already have other documentation solutions in place that are deemed more appropriate, then by all means, use those (perhaps only consider placing a link in the DI-documentation so it is easily findable). Our goal is not to force a single solution on everyone, on the contrary, we choose the git and markdown option precisely because it allows a very distributed and decentralised documentation approach. You can see git and markdown as a common lowest denominator.

The internal git repository is only accessible to DI-ers. This is our primary audience when it comes to editors. However, external parties can contribute too, as we propose e mechanism where they can simple use their own git repositories, each with their own access rights. These will be included into the main repository via git submodules. This allows for documentation to come from multiple sources and be published in multiple places (mirrors) if need be (in different styles).

The primary goal is to have a place to hold our own internal documentation. This that DI-ers are also the primary audience as readers. Think about things like (this list is not normative, we are open to see how things develop):

  • technical guidelines about using various systems (git, kubernetes, gitlab, gitlab CI, VPN)
  • procedures such as how to request subdomains and policies such as URL-naming conventions.
  • meta documentation: how to document things? and lists and documents like the one you are reading now
  • fallback place for software documentation
  • fallback place for project documentation
  • static output documents (specifications, tutorials, etc) from e.g. DI working groups
  • possibly meeting minutes/proceedings (if not sensitive)

These are currently either entirely undocumented or in the old confluence instance or scattered amongst different platforms (and therefore unfindable).

The second notable audience is external parties or the broad general public. We want DI-ers to be able to quickly share documentation with external partners and the general public. A lot of documentation, after all, has a degree of general applicability or is not only for direct colleagues. Note that we do not make any further access distinction when it comes to read-access, things are either internal (DI-only) or public. Fine-grained read-access control is out of scope.

Also explicitly out of scope is any content that is highly security/privacy-sensitive (credentials, confidential minutes etc).

A last function we hope to provide is that of an entry portal to DI documentation; so even if documentation lives entirely elsewhere and is not included as a git submodule, DI-ers are requested to simply add a link it in the documentation menu. This facilitates findability whilst still allow everybody to host documentation whereever they want. We hope that this reduces the overall confusion that is now present.

Addressing concerns

Learning curve for git and markdown for non-technical people

This is certainly true, especially for git (markdown is easy enough to pick up). However, I am assuming that most DI-ers do have this skill already so this is a natural extension of existing best practises we already use for software projects. This is also a skill which I think we would want to promote internally. It also fits existing best practises has lots of technical documentation is written in this way.

To accommodate less technical users, we suggested the option of offering a headless CMS, providing a WYSIWYG way of editing. This adds an extra abstraction layer around the markdown syntax and the underlying git store so they don't have to worry about those.

No real-time collaborative editing

Though git offers all the needed functionality for concurrent editing (and proper attribution), it doesn't lend itself easily for real-time collaboration (i.e. concurrent editing like in Google Docs). I'm not sure too what extent this is an important feature for documentation editors.

If it is, perhaps it is something could be provided by a headless CMS, I have not investigated this yet. Real-time collaborative editing may be most useful during a meeting, simply copying from and to a collaborative markdown-editing platform like hedgedoc might already satisfy this use case for some if this can not be facilitated by a headless CMS.

As to collaboration in general, Git already works in such a way that it is extremely hard to inadvertently overwrite the changes of others (unlike in collaborative editing). All DI-ers have direct push access, merge requests are available in situations where a review is requested. Merge conflicts are easily resolvable though require some expertise.

Existing documentation data in Confluence

Our working assumption is that our current Confluence instance is end-of-life and will at some time dissappear or become unsustainable. What the exact implications are and where documentation should move is not decided, nor up to us to decide. Our proposal just addresses a solution for a subset of the documentation there (our own documentation), but by no means do we aim to offer a solution that necessarily fits what institutes like IISG do or that aims to find a solution for everything that is currently in confluence.

Whatever we choose, moving old data out of Confluence and converting it for a new system is not trivial (though Atlassian does have a migration path if you stick to their solutions).

Authentication

The major technical bottleneck has been to have Team CI implement an authentication layer so we can seperate the site into public vs private using a simple URL-based mechanism. This is now solved. Internal pages require OAuth2 (your KNAW account), the rest is public by default.

Pros and Cons

The following table lists some pros and cons of our solution vs the existing Confluence solution and possible 3rd party cloud solutions. The findings have a fair degree of subjectivity as they're written from our perspective:

Our git+md solution current Confluence 3rd party Cloud solution
Self-hosted ✅ yes ✅ yes ❌ no
Data ownership ✅ own ✅ own ❌ external company
Data in EU? ✅ yes ✅ yes ❌ often not, maybe
Financial Costs ✅ none ❌ yes, license ❌ yes
Open source? ✅ yes ❌ no, proprietary ❌ usually not
Vendor lock-in ✅ minimal ❌ large ❌ often maximal
Decentralised sources ✅ yes ❌ no ❌ no
Offline usage ✅ full (r/w) ❌ no ❌ no
WYSIWYG ❌ no but ✅ with CMS ✅ yes ✅ yes
Markup Syntax ✅ markdown (common) ❌ own markup ? varies
CMS mandatory ✅ never ❌ yes ❌ probably yes
RT collaborative editing ❌ no (maybe with CMS?) ❌ no ✅ probably yes
Learning-curve ❌ high but ✅ ok with CMS ✅ low ✅ low
LT Sustainability ✅ good ❌ very bad (EOL) ❌ very risky

Some further clarification:

  • Self-hosted & Data ownership: We think cloud solutions are undesirable for describing internal documentation, because it would hand possibly private data to them. This is even more urgent in the current political climate and if the solution is US-based, as many are. Self-hosting also has the advantage of keeping important expertise in-house, rather than outsourcing it.
  • Vendor lock-in & Sustainability: As we are git and markdown based, we're not even tied that strongly to our current SSG of choice (mkdocs). There is less vendor-lock in and moving to another SSG always remains an option with not too much effort. Our solution attempts to decouple as much as possible, so that components are relatively interchangeable. Higher-level abstractions (like a CMS) can be stacked on top. The data is not strongly tied to the editor application, and only every loosely tied to the SSG (some metadata).
  • WYSIWYG & CMS mandatory: A integrated solution like Confluence, especially if cloud-based, is indeed user-friendly for non-technical users, but the flip-side is that it is less friendly to technical users: those too will be forced to work via a single web-based application.
  • WYSIWYG: git forges like Gitlab, Github, Forgejo already have the means to edit and preview Markdown in a more user-friendly fashion. So even without a dedicated headless CMS (which we agree on is a good idea), there are already options for WYSIWYG-editing in place.
  • Decentralised sources: We allow documentation to live in different places (=git repositories), each with their own access rights. We use git submodules to simply include those where appropriate.

Feedback

Initial feedback in June 2025 from the MT was positive, citing "broad concensus about the markdown and git approach". Some of the concerns raised back then have already been addressed in the latest version of the documentation as well as this second proposal; such as more clarity around the public vs. internal distinction and defining the scope. We do see this as a moving experiment, to the precise scope will become apparent only when people start to use it.

After requesting feedback from the various teams we unfortunately only received feedback from one team (in december 2025). That feedback, however, was quite extensive, for which we are very grateful. It was critical of the git and markdown approach citing concerns about non-technical users and real-time collaborative editing. We hope to have better motivated our choices a bit better in this second proposal and we also hope to have sketched the undesireability of the Confluence cloud approach they'd prefer.

The status and membership of the documentation workgroup is a big vague and activity has declined. Rather than attempting to formalize this, it's probably better just to keep it fairly informal and invite feedback from anyone who feels they have something to contribute. Use the #di-documentation channel on Slack, e-mail, or the issue tracker on our internal gitlab.

This second proposal was authored and signed-off by Maarten van Gompel (TT), Jente Buijs (TCI) and Mariëlle Veldhuis (TP).