A number of primitive functions have been modified to deliver significantly improved performance in Version 12.1.
Many scans and reductions on Boolean arrays perform a lot faster in Version 12.1. These include the following derived functions:
∧/ ∨/ =/ ≠/ +/ ∧\ ∨\ =\ ≠\ <\
Many inner products where one of the above functions is used as the left function operand are also significantly faster, as they use Boolean reduction.
The performance of Grade Up and Grade Down has been improved, especially for small-range vectors. A small-range array is a simple integer or character array where the difference between its minimum and maximum value is less than twice its number of items.
In addition, the following expressions have been implemented as idioms for small-range vectors and narrow small-range matrices (less than 8 bytes per row):
{⍵[⍋⍵]} {⍵[⍒⍵]} {⍵[⍋⍵;]} {⍵[⍒⍵;]}
Note that it is only these precise expressions that have been optimized by idiom recognition; expressions such as X[⍋X;] are not recognized as idioms.
Compress on the leading axis has been tuned
Matrix multiplication (+.×) has been rewritten and is substantially faster in Version 12.1.
The basic arithmetic dyadic functions (+ - ×) have been rewritten in assembler (x86 platforms) using SSE2 operations where possible.
The new code is implemented in Dyalog APL Version 12.1 for Windows (32 and 64-bit) and for Linux (32 and 64-bit).
On all other platforms and on x86 processors that do not support SSE2, e.g. Intel processors predating Pentium 4 (2001) and AMD processors predating K8 (2003), the previous (Version 12.0) code applies.
Simple indexing and indexed assignment have been completely rewritten. They consume much less memory and many special-cases are optimized.
The set functions (index of, membership, union, intersect, without, unique) now use improved algorithms for certain data types:
Small-range arrays, including all single-byte data types, use a table look-up algorithm that is considerably faster than the old hashing algorithm.
In cases where one argument has a small number of elements, the functions now use an optimised linear search mechanism.
Outer products (e.g. ∘.∩) use retained hash or index tables where relevant.
Transpose has been optimized.