DITA specialization imposes certain restrictions. An inherent challenge in designing
DITA vocabulary modules and document types is understanding how to satisfy markup
requirements
within those restrictions and, when the requirements cannot be met by a design that
fully
conforms to the DITA architecture, how to create customized document types that diverge
from the
DITA standard as little as possible.
DITA imposes the following structural restrictions:
- All topics must have titles.
- Topic body content must be contained within a body element.
- Section elements cannot nest.
- Metadata specific to an element type must be represented using elements, not
attributes.
When markup requirements cannot be met within the DITA architecture, there still might
be an
interest in using DITA features and technology, or a business need for interoperability
with
conforming DITA documents and processors. In this case, the solution is to create
customized document types. Customized document types are document types that do
not conform to the DITA standard. To reduce the cost of producing conforming documents
from
non-conforming documents, custom document types should minimize the extent to which
they
diverge from the DITA standard.
Typical reasons for considering custom document types include the following:
- Optimizing markup for authoring
- Supporting legacy markup structures that are not consistent with DITA structural rules,
for example, footnotes within titles
- Defining different forms of existing structures, such as lists, where the DITA-defined
structures are too constrained
- Providing attributes required by specific processors, such as CMS-defined attributes
for
maintaining management metadata
- Embedding tool-imposed markup in places that do not allow the
<foreign> or <unknown> elements
Remember that customized document types do not conform to the DITA standard, and thus
are not
considered DITA. In many of the cases above, it is possible to define document types
that
conform to the DITA standard. Explore this fully before developing customized document
types.
Optimizing document types
Conforming DITA grammar files are modular, which facilitates exchange of vocabulary
modules
and constraints and simplifies the process of assembling document type shells. In
some cases
there might be a reason to avoid the modular approach and use an optimized document
type
composed of a single file (or a smaller number of files). For example, this could
be
advantageous in situations where validation occurs over a network.
In an optimized DTD, entities might also be resolved to further optimize processing
or
validation. This could speed up processing for environments that process and validate
large
numbers of DITA maps and topics.
An optimized document type will still allow for the creation of conforming DITA content
that follows all other rules in the DITA specification. In these cases it is still
possible
to create a document type that conforms completely to standard DITA coding practices.
Maintaining a conforming copy ensures that the optimized document type is still conforming
to the standard, and might also ease interchange with tools that expect conforming
document
types.
Creating custom document types for non-standard markup
When the relaxed content models for DITA elements are inappropriately open for authoring
purposes, document type shells can remove undesireable domains or use constraint modules
to
restrict content models. If content models are not relaxed enough, and markup requirements
include content models that are less constrained than those defined by DITA, custom
document types might be the only option.
Customized document types do not conform to the DITA standard. Preprocessing can ensure
compatibility with existing publishing processes, but it does not ensure compatibility
with
DITA-supporting authoring tools or content management systems. However, when an
implementation is being heavily customized, a customized document type can help isolate
and
control the consequences of non-standard design.
For example, if an authoring group requires the <p> element to be
spelled out as <paragraph>, the document type could be customized to
change <p> to <paragraph> for authoring
purposes. Such documents then could be preprocessed to rename
<paragraph> back to <p> before they are fed
into a standard DITA publishing process.
Because DITA document types are designed to enable constraints, such customized documents
can still take advantage of existing override schemes. While still not valid DITA,
a
document type shell could be set up that implements local requirements (such as adding
global CMS-defined attributes), and then imports an otherwise valid document type
shell.
This helps isolate non-compliant portions of the document type, while reusing as much
as
possible of the original DITA grammar.
Specialization design considerations
Requirements for new markup often appear to be incompatible with DITA architectural
rules
or existing markup, especially when mapping existing non-DITA markup practice to DITA,
where
the existing markup might have used structures that cannot be
directly expressed in DITA. For example, you might need markup for a specialized form
of
list where the details are not consistent with the base model for DITA lists.
In this case you have two alternatives, one that conforms to DITA and one that does
not.
- Specialize from more generic base elements or attributes.
- Define non-conforming structures and map them to conforming DITA structures as necessary
for processing by DITA-aware processors or for interchange as conforming DITA
documents.
Specializing from more generic base elements, such as defining a list using specializations
of <ph> or <div>, while technically
conforming, might still impede interchange of such documents. Generic DITA processors
will
have no way of knowing that what they see as a sequence of phrases or divisions is
really a
list and should be rendered in a manner similar to standard DITA lists. However, your
documents will be reliably interchangeable with conforming DITA systems.
Defining non-conforming markup structures means that the resulting documents are not
conforming DITA documents. They cannot be reliably processed by generic DITA-aware
processors or interchanged with other DITA systems. However, as long as the documents
can be
transformed into conforming DITA documents without undue effort, interchange and
interoperation requirements can be satisfied as needed. For example, a content management
system could add its own required markup for management metadata, but strip the metadata
when delivering content to conforming DITA processors.
Note that even if one uses the DITA-defined types as a starting point, any change
to those
base types not accomplished through specialization or the constraint feature defines
a
completely new document type that has no normative relationship to the DITA document
types,
and cannot be considered in any way to be a conforming DITA application. In particular,
the
use of DITA specialization from non-DITA base types does not produce DITA-conforming
vocabularies.
Specialize from generic elements or attributes
Most DITA element types have relaxed content models that are specifically designed
to allow
a wide set of options when specializing from them. However, some DITA element types
do
impose limits that might not be acceptable or appropriate for a specific markup application.
In this case, consider specializing from a more generic base element or attribute.
Generic elements are available in DITA at every level of detail, from whole topics
down to
individual keywords, and the generic @base attribute is available for
attribute domain specialization.
For example, if you want to create a new kind of list but cannot usefully do so
specializing from <ul>, <ol>,
<sl>, or <dl>, you can create a new set of
list elements by specializing nested <div> elements. This new list
structure will require specialized processing to generate appropriate output styling,
because it is not semantically tied to the other lists by ancestry. Nevertheless,
it will
remain a valid DITA specialization, with the standard support for generalization,
content
referencing, conditional processing, and more.
The following base elements in <topic> are generic enough to support
almost any structurally-valid DITA specialization:
-
<topic>
- Any content unit that has a title and associated content
-
<section>
- Any non-nesting division of content within a topic, titled or not
-
<p>
- Any non-nesting non-titled block of content below the section level
-
<fig>
- Any titled block of content below the section level
-
<ul>, <ol>,
<dl>, <sl>,
<simpletable>
- Any structured block of content that consists of listed items in one or more
columns
-
<ph>
- Any division of content below the paragraph level
-
<text>
- Text within a phrase
-
<keyword>
- Any non-nesting division of content below the paragraph level
-
<data>
-
Any content that acts as metadata rather than core topic or map
content
-
<foreign>
- Any content that already has a non-DITA markup standard, but still needs to be
authored as part of the DITA document. Processors should attempt to render this
element, if at all possible.
-
<unknown>
- Any non-standard markup that does not fit the DITA model, but needs to be managed
as
part of a DITA document. Processors should not attempt to render this element.
-
<bodydiv>
- A generic, untitled, nestable container for content within topic bodies
-
<sectiondiv>
- A generic, untitled, nestable container for content within sections
-
<div>
- A generic, untitled, nestable container for content within topic bodies or sections
The following attributes in topic are suitable for domain specialization to provide
new
attributes that are required throughout a document type:
-
@props
- Any new conditional processing attribute
-
@base
- Any new attribute that is universally available, has a simple syntax
(space-delimited alphanumeric values), and does not already have a semantic
equivalent
Whenever possible, specialize from the element or attribute that is the closest semantic
match.
Let's be careful that we use the terms "short name" and "abbreviated name" consistently throughout the spec.