Comma Separated Values {R}←{X} ⎕CSV Y

This function imports and exports Comma Separated Value (CSV) data.

Monadic ⎕CSV imports data from a CSV file or converts data from CSV format to an internal format. Dyadic ⎕CSV exports data to a CSV file or converts data from internal format to a CSV format.

Internal Format

Arrays that result from importing CSV data or arrays that are suitable for exporting as CSV data are represented by 3 possible structures:

Note that when importing CSV data, all fields are assumed to be character fields unless otherwise specified (see Column Types below). A field that contains only "numbers" will not be converted to numeric data unless specified as being numeric.

MetaCharacters

Some characters in a CSV file are metacharacters which define the structure of the data; for example, the field separator character between fields. Characters which are not metacharacters are literal characters. The variant options QuoteChar, EscapeChar and DoubleQuote make it possible to interpret metacharacters as literal characters and thus permit fields to contain field separator characters, leading and trailing spaces, and line-endings.

Fixed-width fields do not require these options and they are ignored if fixed-width fields are being processed.

Monadic ⎕CSV

R←⎕CSV Y

Y is an array that specifies just the source of the CSV data (see below) or a 1,2,3 or 4-element vector containing:

[1] Source of CSV Data
[2] Description of the CSV data
[3] Column Types
[4] Header Row Indicator

Source may be one of:

Description

If Y[1] is a file name or tie number Description may be one of:

If omitted or empty, the file encoding is deduced (see below).

If Y[1] is a character array containing CSV data Description is a character scalar 'S' (simple) or 'N' (nested). The default is 'N'

Column Types

This is a scalar numeric code or vector of numeric codes that specifies the field types from the list below. If Column Types is zilde or omitted, the default is 1 (all fields are character).

0 The field is ignored.
1 The field contains character data.
2 The field is to be interpreted as being numeric. Empty cells and cells which cannot be converted to numeric values are not tolerated and cause an error to be signalled.
3 The field is to be interpreted as being numeric but invalid numeric values are tolerated. Empty fields and fields which cannot be converted to numeric values are replaced with the Fill variant option (default 0).
4 The field is to be interpreted numeric data but invalid numeric data is tolerated. Empty fields and fields which cannot be converted to numeric values are returned instead as character data; this type is disallowed when variant option Invert is set to 1.
5 The field is to be interpreted as being numeric but empty fields are tolerated and are replaced with the Fill variant option (default 0). Non-empty cells which cannot be converted to numeric values are not tolerated and cause an error to be signalled.

Note that if Column Types is specified by a scalar 4, all numeric data in all fields will be converted to numbers.

Header Row Indicator

This is a Boolean value (default 0) to specify whether or not the first record in a CSV file is a list of column labels. If Header Row Indicator is 1, the first record (the header row) is treated differently from other records. It is assumed to contain character data (labels) regardless of Y[3] and is returned separately in the result.

Variant options

Monadic ⎕CSV may be applied using the Variant operator with the following options. The Principal option is Invert.

Name Meaning Default
Invert 0, 1 or 2 (see below) 0
Separator The field separator, any single character. If Widths is other than , Separator is ignored. ','
Widths A vector of numeric values describing the width (in characters) of the corresponding columns in the CSV source, or for variable width delimited fields
Decimal The decimal mark in numeric fields - one of '.' or ',' '.'
Thousands The thousands separator in numeric fields, which may be specified as an empty character vector (meaning no separator is defined) or a character scalar ''
Trim A Boolean specifying whether undelimited/unescaped whitespace is trimmed at the beginning and end of fields 1
Ragged A Boolean specifying whether records with varying numbers of fields are allowed; see notes below 0
Fill The numeric value substituted for invalid numeric data in columns of type 3 0
Records The maximum number of records to process or 0 for no limit. This applies only to a file specified by a tie number. 0
QuoteChar The field quote character (delimiter), which may be specified as an empty character vector (meaning none is defined) or a character scalar "
EscapeChar The escape character, which may be specified as an empty character vector (meaning none is defined) or a character scalar ''
DoubleQuote A Boolean which indicates whether (1) or not (0) a quote character within a quoted field is represented by two consecutive quote characters 1

The Separator, QuoteChar and EscapeChar characters, when defined, must be different.

Other options defined for export are also accepted but ignored.

Invert Option

This option specifies how the CSV data should be returned as follows:

0 A table (a matrix whose elements are character vectors or scalars or numbers).
1 A vector, each of whose items contain field (column) values. Character field values are character matrices; numeric field values are numeric vectors.
2 A vector, each of whose items contain field (column) values. Character field values are vectors of character vectors; numeric field values are numeric vectors.

QuoteChar, EscapeChar and DoubleQuote Options

If EscapeChar is set then any character may be prefixed by the escape character. The escape character is typically defined as '\'. The escape character immediately followed by the character c is the literal character c even if c alone would have been a metacharacter.

If QuoteChar is set then fields may be delimited by the specified quote character. Within quoted fields all characters except the quote character, and the escape character if defined, are literal characters.

If DoubleQuote is set to 1 then two consecutive quote characters within a quoted field are interpreted as the single literal quote character.

Result

The result R contains the imported data.

If Y[4] does not specify that the data contains a header then R contains the entire data in the form specified by the Invert variant option.

If Y[4] does specify that the data contains a header then R is a 2-element vector where:

Examples

      ⊃⎕NGET CSVFile←'c:\Dyalog16.0\sales.csv'
┌→───────────────────────────────────────────────┐
│Product,Sales                                   │
│             Widgets,1912                       │
│                         Gimlets,205            │
│                                    Dingbats,189│
│                                                │
└────────────────────────────────────────────────┘
      ⎕CSV CSVFile
┌→───────────────────┐
↓ ┌→──────┐  ┌→────┐ │
│ │Product│  │Sales│ │
│ └───────┘  └─────┘ │
│ ┌→──────┐  ┌→───┐  │
│ │Widgets│  │1912│  │
│ └───────┘  └────┘  │
│ ┌→──────┐  ┌→──┐   │
│ │Gimlets│  │205│   │
│ └───────┘  └───┘   │
│ ┌→───────┐ ┌→──┐   │
│ │Dingbats│ │189│   │
│ └────────┘ └───┘   │
└∊───────────────────┘
      ⎕CSV CSVFile'' ⍬ 1 ⍝ Header row
┌→────────────────────────────────────────────┐
│ ┌→──────────────────┐ ┌→──────────────────┐ │
│ ↓ ┌→──────┐  ┌→───┐ │ │ ┌→──────┐ ┌→────┐ │ │
│ │ │Widgets│  │1912│ │ │ │Product│ │Sales│ │ │
│ │ └───────┘  └────┘ │ │ └───────┘ └─────┘ │ │
│ │ ┌→──────┐  ┌→──┐  │ └∊──────────────────┘ │
│ │ │Gimlets│  │205│  │                       │
│ │ └───────┘  └───┘  │                       │
│ │ ┌→───────┐ ┌→──┐  │                       │
│ │ │Dingbats│ │189│  │                       │
│ │ └────────┘ └───┘  │                       │
│ └∊──────────────────┘                       │
└∊────────────────────────────────────────────┘
      ⎕CSV CSVFile''(1 2)1 ⍝ Fields are Char, Num
┌→──────────────────────────────────────────┐
│ ┌→────────────────┐ ┌→──────────────────┐ │
│ ↓ ┌→──────┐       │ │ ┌→──────┐ ┌→────┐ │ │
│ │ │Widgets│  1912 │ │ │Product│ │Sales│ │ │
│ │ └───────┘       │ │ └───────┘ └─────┘ │ │
│ │ ┌→──────┐       │ └∊──────────────────┘ │
│ │ │Gimlets│  205  │                       │
│ │ └───────┘       │                       │
│ │ ┌→───────┐      │                       │
│ │ │Dingbats│ 189  │                       │
│ │ └────────┘      │                       │
│ └∊────────────────┘                       │
└∊──────────────────────────────────────────┘
      (⎕CSV⍠'Invert' 1)CSVFile'' (1 2) 1    ⍝ Invert 1
┌→────────────────────────────────────────────────────┐
│ ┌→──────────────────────────┐ ┌→──────────────────┐ │
│ │ ┌→───────┐ ┌→───────────┐ │ │ ┌→──────┐ ┌→────┐ │ │
│ │ ↓Widgets │ │1912 205 189│ │ │ │Product│ │Sales│ │ │
│ │ │Gimlets │ └~───────────┘ │ │ └───────┘ └─────┘ │ │
│ │ │Dingbats│                │ └∊──────────────────┘ │
│ │ └────────┘                │                       │
│ └∊──────────────────────────┘                       │
└∊────────────────────────────────────────────────────┘
      ⊃(⎕CSV⍠'Invert' 2)CSVFile'' (1 2) 1    ⍝ Invert 2
┌→──────────────────────────────────────────────────┐
│ ┌→───────────────────────────────┐ ┌→───────────┐ │
│ │ ┌→──────┐ ┌→──────┐ ┌→───────┐ │ │1912 205 189│ │
│ │ │Widgets│ │Gimlets│ │Dingbats│ │ └~───────────┘ │
│ │ └───────┘ └───────┘ └────────┘ │                │
│ └∊───────────────────────────────┘                │
└∊──────────────────────────────────────────────────┘

Notes

File handling

Data may be read from a named file or a tied native file. A tied native file may be read in sections by repeatedly invoking ⎕CSV for a specified maximum number of records (specified by the Records variant) until no more data is read.

In all cases the files must contain text using one of the supported encodings. See File Encodings. The method used to determine the file encoding is as follows:

Note also:

Dyadic ⎕CSV

{R}←X ⎕CSV Y

The left argument X is either:

Y is a 1 or 2-element vector containing:

[1] Destination of CSV Data (see below)
[2] Description of the CSV data (see below)

Destination - may be one of:

Description

If Y[1] is a file name or tie number, Description may be:

If Y[1] is empty, Description may be a character scalar 'S' (simple) or 'N' (nested). If omitted, the default is 'S'

Variant options

Dyadic ⎕CSV may be applied using the Variant operator with the following options.

Name Meaning Default
IfExists a character vector 'Error' or 'Replace' which specifies, when creating a named file which already exists, whether to overwrite it ('Replace') or signal an error ('Error') 'Error'
Separator the field separator, any single character. If Widths is other than , Separator is ignored. ','
Widths a vector of numeric values describing the width (in characters) of the corresponding columns in the CSV source, or for variable width delimited fields
Decimal the decimal mark in numeric fields - one of '.' or ',' '.'
Thousands the thousands separator in numeric fields, which may be specified as an empty character vector (meaning no separator is defined) or a character scalar ''
Trim a Boolean specifying whether whitespace is trimmed at the beginning and end of character fields 1
LineEnding the line ending sequence - see Line separators: (13 10) on Windows; 10 on other platforms
QuoteChar The field quote character (delimiter), which may be specified as an empty character vector (meaning none is defined) or a character scalar "
EscapeChar The escape character, which may be specified as an empty character vector (meaning none is defined) or a character scalar ''
DoubleQuote A Boolean which indicates whether (1) or not (0) a quote character within a quoted field is represented by two consecutive quote characters 1

The Separator, QuoteChar and EscapeChar characters, when defined, must be different. Other options defined for import are also accepted but ignored.

The Overwrite variant option (Boolean) from Version 16.0 remains supported but is deprecated in favour of IfExists.

QuoteChar, EscapeChar and DoubleQuote options

If Y specifies that the CSV data is written to a file then R is the number of bytes (not characters) written, and is shy.

Otherwise, R is the CSV data in the format specified in Y, and is not shy.

Examples

       CSVFile←'c:\Dyalog16.0\sales.csv'
       ⎕←DATA HDR←⎕CSV CSVFile''(1 2)1
┌→──────────────────────────────────────────┐
│ ┌→────────────────┐ ┌→──────────────────┐ │
│ ↓ ┌→──────┐       │ │ ┌→──────┐ ┌→────┐ │ │
│ │ │Widgets│  1912 │ │ │Product│ │Sales│ │ │
│ │ └───────┘       │ │ └───────┘ └─────┘ │ │
│ │ ┌→──────┐       │ └∊──────────────────┘ │
│ │ │Gimlets│  205  │                       │
│ │ └───────┘       │                       │
│ │ ┌→───────┐      │                       │
│ │ │Dingbats│ 189  │                       │
│ │ └────────┘      │                       │
│ └∊────────────────┘                       │
└∊──────────────────────────────────────────┘
     DATA⍪←'Gizmos' 23
      DATA HDR ⎕CSV''
┌→────────────┐
│Product,Sales│
│             │
│Widgets,1912 │
│             │
│Gimlets,205  │
│             │
│Dingbats,189 │
│             │
│Gizmos,23    │
│             │
│             │
└─────────────┘
       CSVFile1←'c:\Dyalog16.0\sales1.csv'
       ⎕←DATA HDR ⎕CSV CSVFile1
  
67
       DATA⍪←'Gimbals' 123
       ⎕←DATA HDR ⎕CSV CSVFile1
FILE NAME ERROR: Unable to create file ("The file exists.")
       ⎕←DATA HDR ⎕CSV CSVFile1
      ∧
       ⎕←DATA HDR(⎕CSV⍠'IfExists' 'Replace')CSVFile1
  
80 

Notes