Eviews workfile format

Allin Cottrell
Wake Forest University, July 2005

Introduction

Eviews is a popular proprietary econometrics program. It is widely used in teaching, and in various places around the Internet one can find datasets made available by publishers of textbooks or by professors in the Eviews workfile format. It struck me that it would be useful if gretl could read this format. There does not appear to be any publicly available specification (not surprising for a proprietary binary format), so I decided to try reverse-engineering. This document sets out my findings. The findings are based on examination of several workfiles from different sources and dates (using Emacs in hexl mode, the strings program, and an exploratory reader program written in C), but I have no idea how general they are. I welcome any corrections or additions.

Overview of format

An Eviews workfile starts with an identifying string, "New MicroTSP Workfile," which seems to be padded out to 24 bytes with NUL characters. This is followed by a header of variable size, but within which certain key information seems to occur at fixed offsets. Then comes a series of 70-byte records, containing information on the data series in the file (and possibly information on other objects in some cases?). The central section of the file contains blocks of actual data, stored as doubles, and other information on the variables. The stream positions of these blocks are given in the preceding 70-byte records. All numbers seem to be stored in little-endian byte order. The examples I have seen also have a substantial swathe of NUL bytes in the central section. The file ends with a trailer section that includes the name of the file and strings representing the starting and ending observations.

Header section

As mentioned above, the header is of variable size. I'm unsure of exactly where the header ends and the series of 70-byte records begins, so I don't know the exact size of the header in any instance, but a common size seems to be 144 or 146 bytes (excluding the leading 24 bytes). In some files I've looked at the header is 32 bytes larger than this. The fields within the header that appear to be fixed are shown below (byte offsets are decimal and relative to the start of the file; lengths are in bytes). As you'll see, there's a lot here that doesn't yet make sense to me.

offset length comment

0

80

???

80

8

long: size of header

88

26

???

114

4

int: number of variables + 1

118

4

time_t: date of last modification of the file (or zero)

122

2

???

124

2

short: data frequency (e.g. 1 for annual, 4 for quarterly)

126

2

short: starting subperiod (for, e.g., quarterly or monthly data) or zero

128

4

int: starting observation (e.g. year)

132

8

???

140

4

int: number of observations

144

variable

??? (mostly NULs)

The long at offset 80 gives a number that is closely related to the stream position of the start of the series of 70-byte variable records. For example, in some files the value is 144. Add 24 bytes for the initial identifier and you get 168. In such files, I've been able to start reading 70-byte records at byte 168. In other files, the value at offset 80 is 176, and one can read 70-byte records from byte 200 (= 176 + 24). But the alignment of those records looks a little funny and it seems "cleaner" if you start 2 bytes later (padding?).

The int at offset 114 seems to give the number of variables in the file plus one. Perhaps this number counts some other object that I haven't attempted to parse.

70-byte records for variables

These records contain stuff that I haven't been able to parse both at the beginning and the end, so I'm unsure of their precise alignment. The first clearly indentifiable element is an int that gives the size of the further record containing the data for the variable in question. The table below is based on the assumption that this element is located at a byte offset of 6 into the record. It's possible that the record starts two bytes earlier, in which case the offsets below would have to be augmented by 2.

offset length comment

0

6

???

6

4

int: size of data record (pointed to by the value at offset 14)

10

4

int: size of actual data block

14

8

long: stream position for further information plus actual data values

22

32

string: the name of the variable, padded to the right with NULs

54

8

long: stream position for "history" information, or zero if there's no history info

62

2

short: code representing the nature of the object?

64

6

???

The unknown material at the start and end of the record adds up to 12 "mystery bytes." Typically, these bytes seem to be identical for all the regular variables in a given workfile.

The "code" at offset 62, read as a short int, seems to be 44 for regular variables, and 43 for the special variable "C" (the constant). There may be other codes too. Following the pointer at offset 54, if it is non-zero, leads you to information on the "revision history" of the variable, and following the pointer at offset 14 leads you to an actual data block. These blocks are described in the following sections.

Note that each Eviews workfile seems to contain two "boilerplate" variables: the constant "C" and a residuals series "RESID". If no equation has been estimated, "RESID" just contains missing values.

Revision history block

offset length comment

0

2

???

2

4

int: length of revision string

6

4

int: another length?

10

8

long: stream position of revision string

If you follow up at the stream position given in the last element, you find a string giving info on how the variable in question has been redefined. The length of this string is variable, and is apparently given by the second element above. In examples I've seen, the int at offset 6 has the same value as the one at offset 2. Perhaps it's a length that is inclusive of something else unknown, that happens to be zero in the cases I've looked at?

Data block

offset length comment

0

4

int: number of observations

4

4

int: starting observation

8

8

??? (NULs in cases I've seen)

16

4

int: ending observation

20

2

??? (NULs in cases I've seen)

22

variable

doubles: data values

From byte 22 onward, it seems one can find a number of doubles given by the first int field of the data block. Missing values ("NA") are represented by 1e-37.

Algorithm for reading the files

On the basis of the foregoing, here is my algorithm for reading data series out of an Eviews workfile.

  1. Check that file begins with the string "New MicroTSP Workfile," and reject it if it does not.

  2. Read the basic dataset information from the header, using the offsets given in the first table above (number of variables, number of observations, frequency, starting observation, and so on). Let hlen denote the long int value read at offset 80, and let n denote the int value read at offset 114.

  3. Advance the read position to hlen + 26, and start reading (n-1) 70-byte records. At each record, check that the "code" (short at offset 62) equals 44. If it's 43, skip forward 70 bytes, ignoring the constant. Else if it's not 44, either try skipping 70 bytes or abort: you've found something that is unknown on the basis of my investigations. Also check the 32-byte variable name at offset 22: if it's "RESID," skip 70 bytes forward because there's nothing of interest there.

  4. If you got a "code 44" and the name is not "RESID," go to the stream position given by the long int at offset 14 in the variable record. Read the number of observations given by the int at offset 0 in this block. If this does not equal the global number of observations (call it T) given in the header, I'm not sure what to do; in all the examples I've looked at the values have been equal. Now go to offset 22 and try reading T doubles. Watch out for values of 1e-37 (missing) -- and also for NaN (not a number), which would indicate that you've somehow got off track.

  5. After each successful read of a variable record, advance your basic reading position by 70 bytes.

  6. When you've checked out n-1 records, stop.

And that's it for now.