Specializing domains in DITA
The Darwin Information Typing Architecture (DITA) is an XML architecture for extensible technical information. A domain extends DITA with a set of elements whose names and content models are unique to an organization or field of knowledge. Architects and authors can combine elements from any number of domains, leading to great flexibility and precision in capturing the semantics and structure of their information. In this overview, you learn how to define your own domains.
Introducing domain specialization
In DITA, the topic is the basic unit of processable content. The topic provides the
title, metadata, and structure for the content. Some topic types provide very simple
content structures. For example, the concept
topic has a single concept body for all of the concept content. By contrast, a task
topic articulates a structure that distinguishes pieces of the task content, such
as the prerequisites, steps, and results.
In most cases, these topic structures contain content elements that are not specific
to the topic type. For example, both the concept body and the task prerequisites permit
common block elements such as p
paragraphs and ul
unordered lists.
Domain specialization lets you define new types of content elements independently of topic type. That is,
you can derive new phrase or block elements from the existing phrase and block elements.
You can use a specialized content element within any topic structure where its base
element is allowed. For instance, because a p
paragraph can appear within a concept body or task prerequisite, a specialized paragraph
could appear there, too.
Here's an analogy from the kitchen. You might think of topics as types of containers for preparing food in different ways, such as a basic frying pan, blender, and baking dish. The content elements are like the ingredients that go into these containers, such as spices, flour, and eggs. The domain resembles a specialty grocer who provides ingredients for a particular cuisine. Your pot might contain chorizo from the carnicería when you're cooking TexMex or risotto when you're cooking Italian. Similarly, your topics can contain elements from the programming domain when you're writing about a programming language or elements from the UI domain when you're writing about a GUI application.
DITA has broad tastes, so you can mix domains as needed. If you're describing how to program GUI applications, your topics can draw on elements from both the programming and UI domains. You can also create new domains for your content. For instance, a new domain could provide elements for describing hardware devices. You can also reuse new domains created by others, expanding the variety of what you can cook up.
In a more formal definition, topic specialization starts with the containing element and works from the top down. Domain specialization, on the other hand, starts with the contained element and works from the bottom up.
Understanding the base domains
A DITA domain collects a set of specialized content elements for some purpose. In effect, a domain provides a specialized vocabulary. With the base DITA package, you receive the following domains:
Domain | Purpose |
---|---|
highlight | To highlight text with styles such as bold, italic, and monospace |
programming | To define the syntax and give examples of programming languages |
software | To describe the operation of a software program |
UI | To describe the user interface of a software program |
In most domains, a specialized element adds semantics to the base element. For example,
the apiname
element of the programming domain extends the basic keyword
element with the semantic of a name within an API.
The highlight domain is a special case. The elements in this domain provide styled presentation instead of semantic or structural markup. The highlight styles give authors a practical way to mark up phrases for which a semantic has not been defined.
Providing such highlight styles through a domain resolves a long-standing dispute for publication DTDs. Purists can omit the highlight domain to enforce documents that should be strictly semantic. Pragmatists can include the highlight domain to provide expressive flexibility for real-world authoring. A semipragmatist could even include the highlight domain in conceptual documents to support expressive authoring but omit the highlight domain from reference documents to enforce strict semantic tagging.
More generally, you can define documents with any combination of domains and topics. As we'll see in Generalizing a domain, the resulting documents can still be exchanged.
Combining an existing topic and domain
The DITA package provides a DTD for each topic type and an omnibus DTD (ditabase.dtd
) that defines all of the topic types. Each of these DTDs includes all of the predefined
DITA domains. Thus, topics written against one of the supplied DTDs can use all of
the predefined domain specializations.
Behind the scenes, a DITA DTD is just a shell. Elements are actually defined in other modules, which are included in the DTD. Through these modules, DITA provides you with the building blocks to create new combinations of topic types and domains.
When you add a domain to your DITA installation, the new domain provides you with additional modules. You can use the additional modules to incorporate the domain into the existing DTDs or to create new DTDs.
In particular, each domain is implemented with two files:
-
A file that declares the entities for the domain. This file has the
.ent
extension. -
A file that declares the elements for the domain. This file has the
.mod
extension.
As an example, let's say we're authoring the reference topics for a programming language. We're purists about presentation, so we want to exclude the highlight domain. We also have no need for the software or UI domains in this reference. We could address this scenario by defining a new shell DTD that combines the reference topic with the programming domain, excluding the other domains.
A shell DTD has a consistent design pattern with a few well-defined sections. The instructions in these sections perform the following actions:
-
Declare the entities for the domains.
In the scenario, this section would include the programming domain entities:
<!ENTITY % pr-d-dec PUBLIC "-//IBM//ENTITIES DITA Programming Domain//EN" "programming-domain.ent"> %pr-d-dec;
-
Redefine the entities for the base content elements to add the specialized content elements from the domains.
This section is crucial for domain specialization. Here, the design pattern makes use of two kinds of entities. Each base content element has an element entity to identify itself and its specializations. Each domain provides a separate domain specialization entity to list the specializations that it provides for a base element. By combining the two kinds of entities, the shell DTD allows the specialized content elements to be used in the same contexts as the base element.
In the scenario, the
pre
element entity identifies thepre
element (which, as in HTML, contains preformatted text) and its specializations. The programming domain provides thepr-d-pre
domain specialization entity to list the specializations for thepre
base element. The same pattern is used for the other base elements specialized by the programming domain:<!ENTITY % pre "pre | %pr-d-pre;"> <!ENTITY % keyword "keyword | %pr-d-keyword;"> <!ENTITY % ph "ph | %pr-d-ph;"> <!ENTITY % fig "fig | %pr-d-fig;"> <!ENTITY % dl "dl | %pr-d-dl;">
To learn which content elements are specialized by a domain, you can look at the entity declaration file for the domain.
-
Define the
domains
attribute of the topic elements to declare the domains represented in the document.Like the
class
attribute, thedomains
attribute identifies dependencies. Where theclass
attribute identifies base elements, thedomains
attribute identifies the domains available within a topic. Each domain provides a domain identification entity to identify itself in thedomains
attribute.In the scenario, the only topic is the
reference
topic. The only domain is the programming domain, which is identified by thepr-d-att
domain identification entity:<!ATTLIST reference domains CDATA "&pr-d-att;">
-
Redefine the infotypes entity to specify the topic types that can be nested within a topic.
In the scenario, this section would declare the
reference
topic:<!ENTITY % info-types "reference">
-
Define the elements for the topic type, including the base topics.
In the scenario, this section would include the base topic and reference topic modules:
<!ENTITY % topic-type PUBLIC "-//IBM//ELEMENTS DITA Topic//EN" "topic.mod"> %topic-type; <!ENTITY % reference-typemod PUBLIC "-//IBM//ELEMENTS DITA Reference//EN" "reference.mod"> %reference-typemod;
-
Define the elements for the domains.
In the scenario, this section would include the programming domain definition module:
<!ENTITY % pr-d-def PUBLIC "-//IBM//ELEMENTS DITA Programming Domain//EN" "programming-domain.mod"> %pr-d-def;
Often, it would be easiest to work by copying an existing DTD and adding or removing
topics or domains. In the scenario, it would be easiest to start with reference.dtd
and remove the highlight, software, and UI domains as shown with the underlined text
below.
<!--vocabulary declarations--> <!ENTITY % ui-d-dec PUBLIC "-//IBM//ENTITIES DITA User Interface Domain//EN" "ui-domain.ent"> %ui-d-dec; <!ENTITY % hi-d-dec PUBLIC "-//IBM//ENTITIES DITA Highlight Domain//EN" "highlight-domain.ent"> %hi-d-dec; <!ENTITY % pr-d-dec PUBLIC "-//IBM//ENTITIES DITA Programming Domain//EN" "programming-domain.ent"> %pr-d-dec; <!ENTITY % sw-d-dec PUBLIC "-//IBM//ENTITIES DITA Software Domain//EN" "software-domain.ent"> %sw-d-dec; <!--vocabulary substitution--> <!ENTITY % pre "pre | %pr-d-pre; | %sw-d-pre;"> <!ENTITY % keyword "keyword | %pr-d-keyword; | %sw-d-keyword; | %ui-d-keyword;"> <!ENTITY % ph "ph | %pr-d-ph; | %sw-d-ph; | %hi-d-ph; | %ui-d-ph;"> <!ENTITY % fig "fig | %pr-d-fig;"> <!ENTITY % dl "dl | %pr-d-dl;"> <!--vocabulary attributes--> <!ATTLIST reference domains CDATA "&ui-d-att; &hi-d-att; &pr-d-att; &sw-d-att;"> <!--Redefine the infotype entity to exclude other topic types--> <!ENTITY % info-types "reference"> <!--Embed topic to get generic elements --> <!ENTITY % topic-type PUBLIC "-//IBM//ELEMENTS DITA Topic//EN" "topic.mod"> %topic-type; <!--Embed reference to get specific elements --> <!ENTITY % reference-typemod PUBLIC "-//IBM//ELEMENTS DITA Reference//EN" "reference.mod"> %reference-typemod; <!--vocabulary definitions--> <!ENTITY % ui-d-def PUBLIC "-//IBM//ELEMENTS DITA User Interface Domain//EN" "ui-domain.mod"> %ui-d-def; <!ENTITY % hi-d-def PUBLIC "-//IBM//ELEMENTS DITA Highlight Domain//EN" "highlight-domain.mod"> %hi-d-def; <!ENTITY % pr-d-def PUBLIC "-//IBM//ELEMENTS DITA Programming Domain//EN" "programming-domain.mod"> %pr-d-def; <!ENTITY % sw-d-def PUBLIC "-//IBM//ELEMENTS DITA Software Domain//EN" "software-domain.mod"> %sw-d-def;
Creating a domain specialization
For some documents, you may need new types of content elements. In a common scenario, you need to mark up phrases that have special semantics. You can handle such requirements by creating new specializations of existing content elements and providing a domain to reuse the new content elements within topic structures.
As an example, let's say we're writing the documentation for a class library. We intend to write processes that will index the documentation by class, field, and method. To support this processing, we need to mark up the names of classes, fields, and methods within the topic content, as in the following sample:
<p>The <classname>String</classname> class provides the <fieldname>length</fieldname> field and the <methodname>concatenate()</methodname> method. </p>
We must define new content elements for these names. Because the names are special
types of names within an API, we can specialize the new elements from the apiname
element provided by the programming domain.
The design pattern for a domain requires an abbreviation to represent the domain.
A sensible abbreviation for the class library domain might be cl
. The identifier for a domain consists of the abbreviation followed by -d
(for domain).
As noted in Combining an existing topic and domain, the domain requires an entity declaration file and an element definition file.
Writing the entity declaration file
The entity declaration file has sections that perform the following actions:
-
Define the domain specialization entities.
A domain specialization entity lists the specialized elements provided by the domain for a base element. For clarity, the entity name is composed of the domain identifier and the base element name. The domain provides domain specialization entities for ancestor elements as well as base elements.
In the scenario, the domain defines a domain specialization entity for the
apiname
base element as well as thekeyword
ancestor element (which is the base element forapiname
):<!ENTITY % cl-d-apiname "classname | fieldname | methodname"> <!ENTITY % cl-d-keyword "classname | fieldname | methodname">
-
Define the domain identification entity.
The domain identification entity lists the topic type as well as the domain and other domains for which the current domain has dependencies. Each domain is identified by its domain identifier. The list is enclosed in parentheses. For clarity, the entity name is composed of the domain identifier and
-att
.In the scenario, the class library domain has a dependency on the programming domain, which provides the
apiname
element:<!ENTITY cl-d-att "(topic pr-d cl-d)">
The complete entity declaration file would look as follows:
<!ENTITY % cl-d-apiname "classname | fieldname | methodname"> <!ENTITY % cl-d-keyword "classname | fieldname | methodname"> <!ENTITY cl-d-att "(topic pr-d cl-d)">
Writing the element definition file
The element definition file has sections that perform the following actions:
-
Define the content element entities for the elements introduced by the domain.
These entities permit other domains to specialize from the elements of the current domain.
In the scenario, the class library domain follows this practice so that additional domains can be added in the future. The domain defines entities for the three new elements:
<!ENTITY % classname "classname"> <!ENTITY % fieldname "fieldname"> <!ENTITY % methodname "methodname">
-
Define the elements.
The specialized content model must be consistent with the content model for the base element. That is, any possible contents of the specialized element must be generalizable to valid contents for the base element. Within that limitation, considerable variation is possible. Specialized elements can be substituted for elements in the base content model. Optional elements can be omitted or required. An element with multiple occurrences can be replaced with a list of specializations of that element, and so on.
The specialized content model should always identify elements through the element entity rather than directly by name. This practice lets other domains merge their specializations into the current domain.
In the scenario, the elements have simple character content:
<!ELEMENT classname (#PCDATA)> <!ELEMENT fieldname (#PCDATA)> <!ELEMENT methodname (#PCDATA)>
-
Define the specialization hierarchy for the element with
class
attribute.For a domain element, the value of the attribute must start with a plus sign. Elements provided by domains should be qualified by the domain identifier.
In the scenario, specialization hierarchies include the
keyword
ancestor element provided by the base topic and theapiname
element provided by the programming domain:<!ATTLIST classname class CDATA "+ topic/keyword pr-d/apiname cl-d/classname "> <!ATTLIST fieldname class CDATA "+ topic/keyword pr-d/apiname cl-d/fieldname "> <!ATTLIST methodname class CDATA "+ topic/keyword pr-d/apiname cl-d/methodname ">
The complete element definition file would look as follows:
<!ENTITY % classname "classname"> <!ENTITY % fieldname "fieldname"> <!ENTITY % methodname "methodname"> <!ELEMENT classname (#PCDATA)> <!ELEMENT fieldname (#PCDATA)> <!ELEMENT methodname (#PCDATA)> <!ATTLIST classname class CDATA "+ topic/keyword pr-d/apiname cl-d/classname "> <!ATTLIST fieldname class CDATA "+ topic/keyword pr-d/apiname cl-d/fieldname "> <!ATTLIST methodname class CDATA "+ topic/keyword pr-d/apiname cl-d/methodname ">
Writing the shell DTD
After creating the domain files, you can write shell DTDs to combine the domain with topics and other domains. The shell DTD must include all domain dependencies.
In the scenario, the shell DTD combines the class library domain with the concept, reference, and task topics and the programming domain. The portions specific to the class library domain are highlighted below in bold:
<!--vocabulary declarations--> <!ENTITY % pr-d-dec PUBLIC "-//IBM//ENTITIES DITA Programming Domain//EN" "programming-domain.ent"> %pr-d-dec; <!ENTITY % cl-d-dec SYSTEM "classlib-domain.ent"> %cl-d-dec; <!--vocabulary substitution--> <!ENTITY % pre "pre | %pr-d-pre;"> <!ENTITY % keyword "keyword | %pr-d-keyword; | %cl-d-apiname;"> <!ENTITY % ph "ph | %pr-d-ph;"> <!ENTITY % fig "fig | %pr-d-fig;"> <!ENTITY % dl "dl | %pr-d-dl;"> <!ENTITY % apiname "apiname | %cl-d-apiname;"> <!--vocabulary attributes--> <!ATTLIST concept domains CDATA "&pr-d-att; &cl-d-att;"> <!ATTLIST reference domains CDATA "&pr-d-att; &cl-d-att;"> <!ATTLIST task domains CDATA "&pr-d-att; &cl-d-att;"> <!--Redefine the infotype entity to exclude other topic types--> <!ENTITY % info-types "concept | reference | task"> <!--Embed topic to get generic elements --> <!ENTITY % topic-type PUBLIC "-//IBM//ELEMENTS DITA Topic//EN" "topic.mod"> %topic-type; <!--Embed topic types to get specific topic structures--> <!ENTITY % concept-typemod PUBLIC "-//IBM//ELEMENTS DITA Concept//EN" "concept.mod"> %concept-typemod; <!ENTITY % reference-typemod PUBLIC "-//IBM//ELEMENTS DITA Reference//EN" "reference.mod"> %reference-typemod; <!ENTITY % task-typemod PUBLIC "-//IBM//ELEMENTS DITA Task//EN" "task.mod"> %task-typemod; <!--vocabulary definitions--> <!ENTITY % pr-d-def PUBLIC "-//IBM//ELEMENTS DITA Programming Domain//EN" "programming-domain.mod"> %pr-d-def; <!ENTITY % cl-d-def SYSTEM "classlib-domain.mod"> %cl-d-def;
Notice that the class library phrases are added to the element entity for keyword
as well as for apiname
. This addition makes the class library phrases available within topic structures
that allow keywords and not just in topic structures that explicitly allow API names.
In fact, the structures of the reference
topic specify only keywords, but it's good practice to add the domain specialization
entities to all ancestor elements.
Considerations for domain specialization
When you define new types of topics or domain elements, remember that the hierarchies for topic specialization and domain specialization must be distinct. A specialized topic cannot use a domain element in a content model. Similarly, a domain element can specialize only from an element in the base topic or in another domain. That is, a topic and domain cannot have dependencies. To combine topics and domains, use a shell DTD.
When specializing elements with internal structure including the ul
, ol
, and dl
lists as well as table
and simpletable
, you should specialize the entire content element. Creating special types of pieces
of the internal structure independently of the whole content structure usually doesn't
make much sense. For example, you usually want to create a special type of list instead
of a special type of li
list item for ordinary ul
and ol
lists.
You should never specialize from the elements of the highlight domain. These style elements do not have a specific semantic. Although the formatting of the highlight styles might seem convenient, you might find you need to change the formatting later.
As noted previously, you should use element entities instead of literal element names in content models. The element entities are necessary to permit domain specialization.
The content model should allow for the possibility that the element entity might expand to a list. When applying a modifier to the element entity, you should enclose the element entity in parentheses. Otherwise, the modifier will apply only to the last element if the entity expands to a list. Similar issues affect an element entity in a sequence:
..., ( %classname; ), ... ... ( %classname; )? ... ... ( %classname; )* ... ... ( %classname; )+ ... ... | %classname; | ...
The parentheses aren't needed if the element entity is already in a list.
Generalizing a domain
As with topics, a specialized content element can be generalized to one of its ancestor
elements. In the previous scenario, a classname
can generalize to apiname
or even keyword
. As a result, documents using different domains but the same topics can be exchanged
or merged without having to generalize the topics.
To return to the highlight style controversy mentioned in Understanding the base domains, a pragmatic document authored with highlight domain will contain phrases like the following:
... the <b>important</b> point is ...
When the document is generalized to the same topic but without the highlight domain,
the pragmatic b
element becomes a purist ph
element, indicating that the phrase is special without introducing presentation:
... the <ph class="+ topic/ph hi-d/b ">important</ph> point is ...
In the previous scenario, the class library authors could send their topics to another
DITA shop without the class library domain. The recipients would generalize the class
library topics, converting the classname
elements to apiname
base elements. After generalization, the recipients could edit and process the class,
field, and method names in the same way as any other API names. That is, the situation
would be the same as if the senders had decided not to distinguish class, field, and
method names and, instead, had marked up these names as generic API names.
As an alternative, the recipients could decide to add the class library domain to
their definitions. In this approach, the senders would provide not only their topics
but also the entity declaration and element definition files for the domain. The recipients
would add the class library domain to their shell DTD. The recipients could then work
with classname
elements without having to generalize.
The recipients can use additional domains with no impact on interoperability. That is, the shell DTD for the recipients could use more domains than the shell DTD for the senders without creating any need to modify the topics.
Note
classname
element might generate a literal classlabel in the output to save some typing and produce consistent labels. After automated generalization, however, the label would not be supplied by the base processing for the
apiname
element. Thus, the dependency would require a special generalization transform to
append the literal classlabel to
classname
elements in the source file.
Summary
Through topic specialization and domains, DITA provides the following benefits:
-
Simpler topic design.
The document designer can focus on the structure of the topic without having to foresee every variety of content used within the structure.
-
Simpler topic hierarchies.
The document designer can add new types of content without having to add new types of topics.
-
Extensible content for existing topics.
The document designer can reuse existing types of topics with new types of content.
-
Semantic precision.
Content elements with more specific semantics can be derived from existing elements and used freely within documents.
-
Simpler element lists for authors.
The document designer can select domains to minimize the element set. Authors can learn the elements that are appropriate for the document instead of learning to disregard unneeded elements.
In short, the DITA domain feature provides for great flexibility in extending and reusing information types. The highlight, programming, and UI domains provided with the base DITA release are only the beginning of what can be accomplished.
Notices
© Copyright International Business Machines Corp., 2002, 2003. All rights reserved.
The information provided in this document has not been submitted to any formal IBM test and is distributed "AS IS," without warranty of any kind, either express or implied. The use of this information or the implementation of any of these techniques described in this document is the reader's responsibility and depends on the reader's ability to evaluate and integrate them into their operating environment. Readers attempting to adapt these techniques to their own environments do so at their own risk.