XML Convert R←{X} ⎕XML Y

⎕XML converts an XML string into an APL array or converts an APL array into an XML string.

Options for ⎕XML are specified using the Variant operator or by the optional left argument X. The former is recommended but the older mechanism using the left argument is still supported.

For conversion from XML, Y is a character vector containing an XML string. The result R is a 5 column matrix whose columns are made up as follows:

Column Description
1 Numeric value which indicates the level of nesting
2 Element name, other markup text, or empty character vector when empty
3 Character data or empty character vector when empty
4 Attribute name and value pairs, (0 2⍴⊂'') when empty
5 A numeric value which indicates what the row contains

The values in column 5 have the following meanings:

Value Description
1 Element
2 Child element
4 Character data
8 Markup not otherwise defined
16 Comment markup
32 Processing instruction markup

These values are additive. For example, a value of 5 in column 5 means that the row contains both an element (value 1) and character data (value 4).

Example

      x←'<xml><document id="001">An introduction to XML'
      x,←'</document></xml>'
      ]display v←⎕XML x
┌→───────────────────────────────────────────────────────┐
↓   ┌→──┐      ┌⊖┐                      ┌→────────┐      │
│ 0 │xml│      │ │                      ⌽ ┌⊖┐ ┌⊖┐ │    3 │
│   └───┘      └─┘                      │ │ │ │ │ │      │
│                                       │ └─┘ └─┘ │      │
│                                       └∊────────┘      │
│   ┌→───────┐ ┌→─────────────────────┐ ┌→───────────┐   │
│ 1 │document│ │An introduction to XML│ ↓ ┌→─┐ ┌→──┐ │ 5 │
│   └────────┘ └──────────────────────┘ │ │id│ │001│ │   │
│                                       │ └──┘ └───┘ │   │
│                                       └∊───────────┘   │
└∊───────────────────────────────────────────────────────┘

For conversion to XML, Y is a 3, 4 or 5 column matrix and the result R is a character vector. The columns of Y have the same meaning as those described above for the result of converting from XML.

Example

      ⎕XML v
<xml> 
  <document id="001">An introduction to XML</document>
</xml>

Introduction to XML and Glossary of Terms

XML is an open standard, designed to allow exchange of data between applications. The full specification 1 describes functionality, including processing directives and other directives, which can transform XML data as it is read, and which a full XML processor would be expected to handle.

The ⎕XML function is designed to handle XML to the extent required to import and export APL data. It favours speed over complexity - some markup is tolerated but largely ignored, and there are no XML query or validation features. APL applications which require processing, querying or validation will need to call external tools for this, and finally call ⎕XML on the resulting XML to perform the transformation into APL arrays.

XML grammar such as processing instructions, document type declarations etc. may optionally be stored in the APL array, but will not be processed or validated. This is principally to allow regeneration of XML from XML input which contains such structures, but an APL application could process the data if it chose to do so.

The XML definition uses specific terminology to describe its component parts. The following is a summary of the terms used in this section:

Character Data

Character data consists of free-form text. The free-form text should not include the characters '>', '<' or '&', so these must be represented by their entity references ('&gt;', '&lt;' and '&amp;' respectively), or numeric character references.

Entity References and Character References

Entity references are named representations of single characters which cannot normally be used in character data because they are used to delimit markup, such as &gt; for '>'. Character references are numeric representations of any character, such as &#20; for space. Note that character references always take values in the Unicode code space, regardless of the encoding of the XML text itself.

⎕XML converts entity references and all character references which the APL character set is able to represent into their character equivalent when generating APL array data; when generating XML it converts any or all characters to entity references as needed.

There is a predefined set of entity references, and the XML specification allows others to be defined within the XML using the <!ENTITY > markup. ⎕XML does not process these additional declarations and therefore will only convert the predefined types.

Whitespace

Whitespace sequences consist of one or more spaces, tabs or line-endings. Within character data, sequences of one or more whitespace characters are replaced with a single space when this is enabled by the whitespace option. Line endings are represented differently on different systems (0x0D 0x0A, 0x0A and 0x0D are all used) but are normalized by converting them all to 0x0A before the XML is parsed, regardless of the setting of the whitespace option.

Elements

An element consists of a balanced pair of tags or a single empty element tag. Tags are given names, and start and end tag names must match.

An example pair of tags, named TagName is

<TagName></TagName>

This pair is shown with no content between the tags; this may be abbreviated as an empty element tag as

<TagName/>

Tags may be given zero or more attributes, which are specified as name/value pairs; for example

<TagName AttName="AttValue">

Attribute values may be delimited by either double quotes as shown or single quotes (apostrophes); they may not contain certain characters (the delimiting quote, '&' or '<') and these must be represented by entity or character references.

The content of elements may be zero or more mixed occurrences of character data and nested elements. Tags and attribute names describe data, attribute values and the content within tags contain the data itself. Nesting of elements allows structure to be defined.

Because certain markup which describes the format of allowable data (such as element type declarations and attribute-list declarations) is not processed, no error will be reported if element contents and attributes do not conform to their restricted declarations, nor are attributes automatically added to tags if not explicitly given.

Attributes with names beginning xml: are reserved. Only xml:space is treated specially by ⎕XML. When converting both from and to XML, the value for this attribute has the following effects on space normalization for the character data within this element and child elements within it (unless subsequently overridden):

Regardless of whether the attribute name and value have a recognised meaning, the attribute will be included in the APL array / generated XML. Note that when the names and values of attributes are examined, the comparisons are case-sensitive and take place after entity references and character references have been expanded.

Comments

Comments are fully supported markup. They are delimited by '<!--' and '-->' and all text between these delimiters is ignored. This text is included in the APL array if markup is being preserved, or discarded otherwise.

CDATA Sections

CDATA Sections are fully supported markup. They are used to delimit text within character data which has, or may have, markup text in it which is not to be processed as such. They and are delimited by '<![CDATA[' and ']]>'. CDATA sections are never recorded in the APL array as markup when XML is processed – instead, that data appears as character data. Note that this means that if you convert XML to an APL array and then convert this back to XML, CDATA sections will not be regenerated. It is, however, possible to generate CDATA sections in XML by presenting them as markup.

Processing Instructions

Processing Instructions are delimited by '<&' and '&>' but are otherwise treated as other markup, below.

Other markup

The remainder of XML markup, including document type declarations,  XML declarations and text declarations are all delimited by '<!' and '>', and may contain nested markup. If markup is being preserved the text, including nested markup, will appear as a single row in the APL array.  ⎕XML does not process the contents of such markup. This has varying effects, including but not limited to the following:

Conversion from XML

Conversion to XML

Conversion to XML takes an array with the format described above and generates XML text from it. There are some simplifications to the array which are accepted:

The following validations are performed on the data in the array:

Then, character references and entity references are emitted in place of characters where necessary, to ensure that valid XML is generated. However, markup, if present, is not validated and it is possible to generate invalid XML if care in not taken with markup constructs.

Options

There are 3 options which may be specified using the Variant operator (recommended) or by the optional left argument X (retained for backwards compatibility). The names are different and are case-sensitive; they must be spelled exactly as shown below.

Option names for Variant Option names for left argument
Whitespace whitespace
Markup markup
UnknownEntity unknown-entity

The values of each option are tabulated below. In each case the value of the option for Variant is given first, followed by its equivalent for the optional left argument in brackets; e.g. UnknownEntity (unknown-entity).

Note that the default value is shown first, and that the option names and values are case-sensitive.

If options are specified using the optional left argument, X specifies a set of option/value pairs, each of which is a character vector. X may be a 2-element vector, or a vector of 2-element character vectors. In the examples below, this method is illustrated by the equivalent expression written as a comment, following the recommended approach using the Variant operator . i.e.

      ]display (⎕XML⍠'Whitespace' 'Strip')eg
      ⍝      'whitespace' 'strip' ⎕XML eg

Errors detected in the input arrays or options will all cause DOMAIN ERROR.

Whitespace (whitespace)

When converting from XML Whitespace specifies the default handling of white space surrounding and within character data. When converting to XML Whitespace specifies the default formatting of the XML. Note that attribute values are not comprised of character data so white space in attribute values is always preserved.

Converting from XML
Strip
(strip)
All leading and trailing whitespace sequences are removed; remaining whitespace sequences are replaced by a single space character
Trim
(trim)
All leading and trailing whitespace sequences are removed; all remaining white space sequences are handled as preserve
Preserve
(preserve)
Whitespace is preserved as given except that line endings are represented by Linefeed (⎕UCS 10)
Converting to XML
Strip
(strip)
All leading and trailing whitespace sequences are removed; remaining whitespace sequences within the data are replaced by a single space character. XML is generated with formatting and indentation to show the data structure
Trim
(trim)
Synonymous with strip
Preserve
(preserve)
White space in the data is preserved as given, except that line endings are represented by Linefeed (⎕UCS 10). XML is generated with no formatting and indentation other than that which is contained within the data
      ]display eg
┌→───────────────────┐
│<xml>               │
│  <a>               │
│    Data1           │
│    <!-- Comment -->│
│    Data2           │
│    <b> Data3 </b>  │
│    Data4           │
│    <c att="val"/>  │
│  </a>              │
│</xml>              │
└────────────────────┘
 
      ]display (⎕XML⍠'Whitespace' 'Strip')eg
      ⍝      'whitespace' 'strip' ⎕XML eg
┌→────────────────────────────────────────┐
↓   ┌→──┐ ┌⊖┐           ┌→────────┐       │
│ 0 │xml│ │ │           ⌽ ┌⊖┐ ┌⊖┐ │     3 │
│   └───┘ └─┘           │ │ │ │ │ │       │
│                       │ └─┘ └─┘ │       │
│                       └∊────────┘       │
│   ┌→┐   ┌⊖┐           ┌→────────┐       │
│ 1 │a│   │ │           ⌽ ┌⊖┐ ┌⊖┐ │     7 │
│   └─┘   └─┘           │ │ │ │ │ │       │
│                       │ └─┘ └─┘ │       │
│                       └∊────────┘       │
│   ┌⊖┐   ┌→──────────┐ ┌→────────┐       │
│ 2 │ │   │Data1 Data2│ ⌽ ┌⊖┐ ┌⊖┐ │     4 │
│   └─┘   └───────────┘ │ │ │ │ │ │       │
│                       │ └─┘ └─┘ │       │
│                       └∊────────┘       │
│   ┌→┐   ┌→────┐       ┌→────────┐       │
│ 2 │b│   │Data3│       ⌽ ┌⊖┐ ┌⊖┐ │     5 │
│   └─┘   └─────┘       │ │ │ │ │ │       │
│                       │ └─┘ └─┘ │       │
│                       └∊────────┘       │
│   ┌⊖┐   ┌→────┐       ┌→────────┐       │
│ 2 │ │   │Data4│       ⌽ ┌⊖┐ ┌⊖┐ │     4 │
│   └─┘   └─────┘       │ │ │ │ │ │       │
│                       │ └─┘ └─┘ │       │
│                       └∊────────┘       │
│   ┌→┐   ┌⊖┐           ┌→────────────┐   │
│ 2 │c│   │ │           ↓ ┌→──┐ ┌→──┐ │ 1 │
│   └─┘   └─┘           │ │att│ │val│ │   │
│                       │ └───┘ └───┘ │   │
│                       └∊────────────┘   │
└∊────────────────────────────────────────┘
 
      ]display (⎕XML⍠'Whitespace' 'Preserve')eg
      ⍝         'whitespace' 'preserve' ⎕XML eg
┌→──────────────────────────────────────┐
↓   ┌→──┐ ┌⊖┐         ┌→────────┐       │
│ 0 │xml│ │ │         ⌽ ┌⊖┐ ┌⊖┐ │     7 │
│   └───┘ └─┘         │ │ │ │ │ │       │
│                     │ └─┘ └─┘ │       │
│                     └∊────────┘       │
│   ┌⊖┐   ┌→─┐        ┌→────────┐       │
│ 1 │ │   │  │        ⌽ ┌⊖┐ ┌⊖┐ │     4 │
│   └─┘   │  │        │ │ │ │ │ │       │
│         └──┘        │ └─┘ └─┘ │       │
│                     └∊────────┘       │
│   ┌→┐   ┌⊖┐         ┌→────────┐       │
│ 1 │a│   │ │         ⌽ ┌⊖┐ ┌⊖┐ │     7 │
│   └─┘   └─┘         │ │ │ │ │ │       │
│                     │ └─┘ └─┘ │       │
│                     └∊────────┘       │
│   ┌⊖┐   ┌→────────┐ ┌→────────┐       │
│ 2 │ │   │         │ ⌽ ┌⊖┐ ┌⊖┐ │     4 │
│   └─┘   │    Data1│ │ │ │ │ │ │       │
│         │         │ │ └─┘ └─┘ │       │
│         │    Data2│ └∊────────┘       │
│         │         │                   │
│         └─────────┘                   │
│   ┌→┐   ┌→──────┐   ┌→────────┐       │
│ 2 │b│   │ Data3 │   ⌽ ┌⊖┐ ┌⊖┐ │     5 │
│   └─┘   └───────┘   │ │ │ │ │ │       │
│                     │ └─┘ └─┘ │       │
│                     └∊────────┘       │
│   ┌⊖┐   ┌→────────┐ ┌→────────┐       │
│ 2 │ │   │         │ ⌽ ┌⊖┐ ┌⊖┐ │     4 │
│   └─┘   │    Data4│ │ │ │ │ │ │       │
│         │         │ │ └─┘ └─┘ │       │
│         └─────────┘ └∊────────┘       │
│   ┌→┐   ┌⊖┐         ┌→────────────┐   │
│ 2 │c│   │ │         ↓ ┌→──┐ ┌→──┐ │ 1 │
│   └─┘   └─┘         │ │att│ │val│ │   │
│                     │ └───┘ └───┘ │   │
│                     └∊────────────┘   │
│   ┌⊖┐   ┌→─┐        ┌→────────┐       │
│ 2 │ │   │  │        ⌽ ┌⊖┐ ┌⊖┐ │     4 │
│   └─┘   │  │        │ │ │ │ │ │       │
│         └──┘        │ └─┘ └─┘ │       │
│                     └∊────────┘       │
│   ┌⊖┐   ┌→┐         ┌→────────┐       │
│ 1 │ │   │ │         ⌽ ┌⊖┐ ┌⊖┐ │     4 │
│   └─┘   │ │         │ │ │ │ │ │       │
│         └─┘         │ └─┘ └─┘ │       │
│                     └∊────────┘       │
└∊──────────────────────────────────────┘

Markup (markup)

When converting from XML, Markup determines whether markup (other than entity tags) appears in the output array or not. When converting to XML Markup has no effect.

Converting from XML
Strip
(strip)
Markup data is not included in the output array
Preserve
(preserve)
Markup text appears in the output array, without the leading '<' and trailing '>' in the tag (2nd) column
  
      ]display eg
┌→───────────────────┐
│<xml>               │
│  <a>               │
│    Data1           │
│    <!-- Comment -->│
│    Data2           │
│    <b> Data3 </b>  │
│    Data4           │
│    <c att="val"/>  │
│  </a>              │
│</xml>              │
└────────────────────┘
 
      ]display (⎕XML⍠'Markup' 'Strip')eg
      ⍝      'markup' 'strip' ⎕XML eg
┌→────────────────────────────────────────┐
↓   ┌→──┐ ┌⊖┐           ┌→────────┐       │
│ 0 │xml│ │ │           ⌽ ┌⊖┐ ┌⊖┐ │     3 │
│   └───┘ └─┘           │ │ │ │ │ │       │
│                       │ └─┘ └─┘ │       │
│                       └∊────────┘       │
│   ┌→┐   ┌⊖┐           ┌→────────┐       │
│ 1 │a│   │ │           ⌽ ┌⊖┐ ┌⊖┐ │     7 │
│   └─┘   └─┘           │ │ │ │ │ │       │
│                       │ └─┘ └─┘ │       │
│                       └∊────────┘       │
│   ┌⊖┐   ┌→──────────┐ ┌→────────┐       │
│ 2 │ │   │Data1 Data2│ ⌽ ┌⊖┐ ┌⊖┐ │     4 │
│   └─┘   └───────────┘ │ │ │ │ │ │       │
│                       │ └─┘ └─┘ │       │
│                       └∊────────┘       │
│   ┌→┐   ┌→────┐       ┌→────────┐       │
│ 2 │b│   │Data3│       ⌽ ┌⊖┐ ┌⊖┐ │     5 │
│   └─┘   └─────┘       │ │ │ │ │ │       │
│                       │ └─┘ └─┘ │       │
│                       └∊────────┘       │
│   ┌⊖┐   ┌→────┐       ┌→────────┐       │
│ 2 │ │   │Data4│       ⌽ ┌⊖┐ ┌⊖┐ │     4 │
│   └─┘   └─────┘       │ │ │ │ │ │       │
│                       │ └─┘ └─┘ │       │
│                       └∊────────┘       │
│   ┌→┐   ┌⊖┐           ┌→────────────┐   │
│ 2 │c│   │ │           ↓ ┌→──┐ ┌→──┐ │ 1 │
│   └─┘   └─┘           │ │att│ │val│ │   │
│                       │ └───┘ └───┘ │   │
│                       └∊────────────┘   │
└∊────────────────────────────────────────┘
      ]display (⎕XML⍠'Markup' 'Preserve')eg
      ⍝         'markup' 'preserve' ⎕XML eg
┌→──────────────────────────────────────────────┐
↓   ┌→──┐            ┌⊖┐     ┌→────────┐        │
│ 0 │xml│            │ │     ⌽ ┌⊖┐ ┌⊖┐ │     3  │
│   └───┘            └─┘     │ │ │ │ │ │        │
│                            │ └─┘ └─┘ │        │
│                            └∊────────┘        │
│   ┌→┐              ┌⊖┐     ┌→────────┐        │
│ 1 │a│              │ │     ⌽ ┌⊖┐ ┌⊖┐ │     23 │
│   └─┘              └─┘     │ │ │ │ │ │        │
│                            │ └─┘ └─┘ │        │
│                            └∊────────┘        │
│   ┌⊖┐              ┌→────┐ ┌→────────┐        │
│ 2 │ │              │Data1│ ⌽ ┌⊖┐ ┌⊖┐ │     4  │
│   └─┘              └─────┘ │ │ │ │ │ │        │
│                            │ └─┘ └─┘ │        │
│                            └∊────────┘        │
│   ┌→─────────────┐ ┌⊖┐     ┌→────────┐        │
│ 2 │!-- Comment --│ │ │     ⌽ ┌⊖┐ ┌⊖┐ │     16 │
│   └──────────────┘ └─┘     │ │ │ │ │ │        │
│                            │ └─┘ └─┘ │        │
│                            └∊────────┘        │
│   ┌⊖┐              ┌→────┐ ┌→────────┐        │
│ 2 │ │              │Data2│ ⌽ ┌⊖┐ ┌⊖┐ │     4  │
│   └─┘              └─────┘ │ │ │ │ │ │        │
│                            │ └─┘ └─┘ │        │
│                            └∊────────┘        │
│   ┌→┐              ┌→────┐ ┌→────────┐        │
│ 2 │b│              │Data3│ ⌽ ┌⊖┐ ┌⊖┐ │     5  │
│   └─┘              └─────┘ │ │ │ │ │ │        │
│                            │ └─┘ └─┘ │        │
│                            └∊────────┘        │
│   ┌⊖┐              ┌→────┐ ┌→────────┐        │
│ 2 │ │              │Data4│ ⌽ ┌⊖┐ ┌⊖┐ │     4  │
│   └─┘              └─────┘ │ │ │ │ │ │        │
│                            │ └─┘ └─┘ │        │
│                            └∊────────┘        │
│   ┌→┐              ┌⊖┐     ┌→────────────┐    │
│ 2 │c│              │ │     ↓ ┌→──┐ ┌→──┐ │ 1  │
│   └─┘              └─┘     │ │att│ │val│ │    │
│                            │ └───┘ └───┘ │    │
│                            └∊────────────┘    │
└∊──────────────────────────────────────────────┘

UnknownEntity (unknown-entity)

When converting from XML, this option determines what happens when an unknown entity reference, or a character reference for a Unicode character which cannot be represented as an APL character, is encountered. In Classic versions of Dyalog APL that is any Unicode character which does not appear in ⎕AVU. When converting to XML, this option determines what happens to Esc characters (⎕UCS 27) in data.

Converting from XML
Replace
(replace)
The reference is replaced by a single '?' character
Preserve
(preserve)
The reference is included in the output data as given, but with the leading '&' replaced by Esc (⎕UCS 27)
Converting to XML
Replace
(replace)
Esc (⎕UCS 27) is preserved
Preserve
(preserve)
Esc (⎕UCS 27) is replaced by '&'