This file describes the EEP 3.1 compressed file format.

MPI/ANT (eeprobe@ant-software.nl)
Max-Planck-Institute of Cognitive Neuroscience, Leipzig

$Id$


1. Overall Structure
--------------------

The cnt file format conforms to the RIFF(Resource Interchange File Format) 
specification. The (unregistered) form type is <CNT >. 

RIFF means that the file is composed of data blocks (chunks)
which are accessible via an ID (four-character-code, FOURCC) in a hierarchy 
(chunk tree). Data are stored lo-byte first ("Intel-style") if not mentioned
otherwise.

Refer to a RIFF documentation for details. I have used 
Josef Poepsel, "Multimediale Klippen" c't 11/94 p. 327 ff, Heinz Heise Verlag


2. Chunk Tree
-------------

<RIFF><CNT >
  <nsh >              archived NeuroScan header, optional
  <refh>              archived REFA configuration, optional
  <eeph>              ASCII header (labels, scalings...)
  <LIST><raw3>        compressed data LIST 
    <chan>              channel sequence
    <data>              compressed data
    <ep  >              epoch offsets in data chunk
  <evt >              session events, optional

It is save to add more chunks in the toplevel. Existing EEP modules copy 
unknown chunks from input files to output files.

There is no guaranteed toplevel chunk sequence. One has to look for the 
interesting chunks.


3. <RIFF><CNT > Chunk Contents
------------------------------


3.1 <nsh >

Just a copy of the binary NeuroScan header, (only present if one was available
during file creation time)

3.2 <refh>

Just a copy of the REFA acquisition configuration. (only present if
the file was generated by refa2cnt).

3.2 <eeph>

Global file informations in plain ASCII text. See the example:

[Sampling Rate]
249.9999881256
[Samples]
36350
[Channels]
4
[Basic Channel Data]
;label    calibration factor
EOGV       1.7187500000e+00 4.8828125000e-02 uV
EOGH       1.7187500000e+00 4.8828125000e-02 uV
E1         1.7187500000e+00 4.8828125000e-02 uV
E2         1.7187500000e+00 4.8828125000e-02 uV
[History]
ns2riff 3.7   (OSF1 V4.0 alpha)               Wed Nov 26 12:27:41 1997
EOH

The first column stores the unique(!) channel label. It can consist of max. 10
alphanumeric characters. Channel labels are case insensitive; it is an error
to have "a1" and "A1" channels in one file.

Column 2 and 3 contains two scaling factors for each channel. Most
amplifiers uses separate calibration/amplification/definition factors 
for real world conversion and I didn't want to merge them.

Column 4 is for a max. 10 character unit string.
Each 16 bit sample value must be multiplied with both factors to
convert it to a value in the listed unit. Note that the latin "u" is used for
"micro" instead of the greek "mue".

The [History] block is optional. Each entry is one free-form line of text.
The final EOH means "End Of History" and is required if a [History] is present.

The comment line ";label    calibration factor" is required.

3.3 <evt >

list of sample/code pairs, the number of pairs can be calculated with
chunksize/12

  bytes: |     4     |    8    |     4     |    8    | ...
  value: | sample[0] | code[0] | sample[1] | code[1] | ... 

  sample            time point of event as 0-based sample index     
  code              alphanumeric event code, 0-terminated if shorter then
                    8 characters


3.4 <LIST><raw3>

LIST chunk which contain the compressed data matrix and the informations needed
for decompression in three subchunks; 
("raw3" is how I called my compression algorithm)

3.4.1 <chan>

The data channels are rearranged to improve the compression ratio. The <chan> 
chunk stores the channel indices in the original record - one 16 bit signed
0-based value for each channel in file.

3.4.2 <data>

The huge compressed data chunk. The record is stored channelwise in epochs. 
Each of these signal pieces contains the data of a few hundred sample
points (typically 1 second). The actual epoch_length is stored in the <ep  > 
chunk(see below).

  |chan_0_epoch_0|chan_1_epoch_0| ... |chan_n-1_epoch0|chan_0_epoch_1| ...
  
Each of the blocks above stores residuals (the signal prediction errors of
the compression algorithm) in a compressed form and the rules
how to read the residuals and how to build the original data.
Block starts are aligned to byte boundaries. The block data itself are 
counted in bits and there may be unused bits in the last byte of a block.
All values are stored with MSB first.
  
The general form of a block is

  bits:   |   4    |   ?    |  ?   |
  value:  | method | header | data |
  
The actual header and data layout depends on the prediction method
and the required bitwidth (16 or 32). Possible methods are:

  0                 no residuals, original values stored
  1                 residuals from first deviation
  2                 residuals from second deviation
  3                 residuals from difference of first deviation
                    and first dev. of neighbor channel

  8                 same as above for 32 bit data
  9
 10
 11

for methods 1, 2 and 3 the block layout is (n is for the number of
samples in the current epoch)

  bits:   |   4    |   4   |     4    |  16  | nbits or (nbits + nexcbits) |
  value:  | method | nbits | nexcbits | y[0] | r[1] .. r[n-1]              |

for method 9, 10 or 11:

  bits:   |   4    |   6   |     6    |  32  | nbits or (nbits + nexcbits) |
  value:  | method | nbits | nexcbits | y[0] | r[1] .. r[n-1]              |

  nbits             number of bits needed to store a "regular" residual
  nexcbits          number of bits needed to store a "exceptional" residual
                    (0 means 16 in method 1,2 or 3)
  y[0]              first sample value
  r[1] .. r[n-1]    residuals, read nbits bits for each residual value, if this 
                    value equals -(2^(nbits-1)) read the next nexcbits bits 
                    to get the residual

The signal values y[1] .. y[n-1] are computed from the residuals as follows:
  
  method 1 or 9:
    y[i] = y[i-1] + r[i]                          i = 1 .. n-1
  
  method 2 or 10:
    y[1] = y[0] + r[1]
    y[i] = y[i-2] + (y[i-1] - y[i-2]) + r[i]      i = 2 .. n-1
  
  method 3 or 11:
    y[i] = y[i-1] + (Y[i] - Y[i-1]) + r[i]        i = 1 .. n-1
    
    where Y denotes the previous decompressed channel or a vector filled with
    zeroes if y is the first channel
    
    
method 0 is simple (there is no compression)
 
  bits:    |   4    |   4   |  16  ..  16    |
  value:   | method | dummy | y[0] .. y[n-1] |
  
  method            always 0
  dummy             unused, undefined
  y[0] .. y[n-1]    signal values

method 8 is equivalent
  bits:    |   4    |   4   |  32  ..  32    |
  value:   | method | dummy | y[0] .. y[n-1] |


3.4.3 <ep  >

the total number of epochs (ne) can be calculated with
(chunksize - 4) / 4

  bytes:   |     4        |       4        ..       4           |
  value:   | epoch_length | epoch_start[0] .. epoch_start[ne-1] |
  
  
  epoch_length      length of compressed epochs in samples
                    the last epoch can be shorter (Samples % epoch_length)

  epoch_start       start of epoch as a byte index in the <data> chunk