DC_READ_FREE Function

Reads freely-formatted ASCII files.


0	Indicates an error, such as an invalid filename or an I/O error.
0	Indicates a successful read.

Keywords

Column

A flag that signifies filename is a column-organized file.

Delim An array of single-character strings that are the field separators used in the data file. If not provided, a comma- or space- delimited file is assumed.

Dt_Template An array of integers indicating the date/time templates that are to be used for interpreting date/time data. Positive numbers refer to date templates; negative numbers refer to time templates. For more details, see Example 5. To see a complete list of date/time templates, see the PV-WAVE Programmer's Guide.

Filters An array of one-character strings that PV-WAVE should check for and filter out as it reads the data. A character found on the keyboard can be typed; a special character not found on the keyboard is specified by ASCII code. For more details about the Filters keyword, see Filtering and Substitution While Reading Data.

Get_Columns An array of integers indicating column numbers to read in the file. If not provided or if set equal to zero (0), all columns are read. Ignored if the Row keyword is supplied.

Ignore An array of strings; if any of these strings are encountered, PV-WAVE skips the entire line and starts reading data from the next line. Any string is allowed, but the following three strings have special meanings:


`$BLANK_LINES`	Skip all blank lines; this prevents those lines from being interpreted as a series of zeroes.
`$TEXT_IN_NUMERIC`	Skip any line where text is found in a numeric field.
`$BAD_DATE_TIME`	Skip any line where invalid date/time data is found.

For an example showing how to use the Ignore keyword, see

Example 2

Miss_Str A string array that specifies strings that may be present in the data file to represent missing data. If not present, PV-WAVE does not check for missing data as it reads the file.

Miss_Vals An array of floating-point values, each of which corresponds to a string in Miss_Str. As PV-WAVE reads the input data file, occurrences of strings that match those in Miss_Str are replaced by the corresponding element of Miss_Vals.

Nrecs Number of records to read. If not provided or if set equal to zero (0), the entire file is read. For more information about records, see Physical Records vs. Logical Records.

Nskip Number of physical records in the file to skip before data is read. If not provided or if set equal to zero (0), no records are skipped.

Resize An array of integers indicating the variables in var_list that can be resized based on the number of records detected in the input data file. Values in Resize should be in the range:

Resize_n

#_of_vars_in_var_list

For an example showing how to use the Resize keyword, see DC_READ_FIXED,

Example 4

Row A flag that signifies filename is a row-organized file. If neither Row nor Column is present, Row is the default.

Vals_Per_Rec A long integer that specifies how many values comprise a single record in the input data file; use only with column-oriented files. If not provided, each line of data in the file is treated as a new record. For more details about when to use the Vals_Per_Rec keyword, see Example 4.

Discussion

DC_READ_FREE is very adept at reading column-oriented data files. Also, DC_READ_FREE handles many steps that you have to do yourself when using other PV-WAVE functions and procedures. These steps include: 1) opening the file, 2) assigning it a logical unit number (LUN), and 3) closing the file when you are done reading the data.

DC_READ_FREE relieves you of the task of composing a format string that describes the organization of the data in the input file. All you do is tell DC_READ_FREE which delimiters to expect in the file; comma and space are the default delimiters expected. In other words, DC_READ_FREE easily reads data values separated by any combination of commas and spaces, or any other delimiters that you explicitly define using the Delim keyword.

If neither the Row or Column keywords are provided, the file is assumed to be organized by rows. If both keywords are used, the Row keyword is assumed.

NOTE: This function can be used to read data into date/time structures, but not into any other kind of structures.

String Resources Used By This Function

Upon execution, the DC_READ_FREE function examines two strings in a string resource file. These strings, described below, allow you to control how the function handles binary files.

The string resource file is:

: (UNIX)wavedir/xres/!Lang/kernel/dc.ads

: (OpenVMS)wavedir:[XRES.!Lang.KERNEL]DC.ADS

: (Windows)wavedir\xres\!Lang\kernel\dc.ads

Where

wavedir is the main PV-WAVE directory.

The strings that are examined are DC_binary_check and DC_allow_chars.

DC_binary_check This string can be set to the values True or False. If set to True, the data file is checked for the presence of binary characters before the file is read. If binary characters are found, the file is not read. If this string is set to False, no binary character checking is performed. (Default: True)

For example, to turn off binary checking, set the string as follows in the dc.ads file:

DC_binary_check: False

DC_allow_chars

This string lets you specify additional characters to allow in the check for binary files. Before a file is read, the first several lines are checked for the presence of non-printable characters. If non-printable characters are found, the file is considered to be a binary file and the file is not read. By default, all printable characters in the system locale are allowed. Characters may be specified either by entering them directly or numerically by three digit decimal values by preceding them with a "\" (backslash).

For example, to allow characters 165 and 220, set the string as follows in the dc.ads file:

DC_allow_chars: \165\220

How the Data is Transferred into Variables

As many as 255 variables can be included in the input argument var_list. You can use the continuation character ($) to continue the function call onto additional lines, if needed. Any undeclared variables in var_list are assumed to have a data type of float (single-precision floating-point).

As data is being transferred into multi-dimensional variables, those variables are treated as collections of scalar variables, meaning the first subscript of the import variable varies the fastest. For two-dimensional import variables, this implies that the column index varies faster than the row index. In other words, data is transferred into a two-dimensional import variable one row at a time. For more details about reading column-oriented data into multi-dimensional variables, see Example 4.

If the current input line is empty or DC_READ_FREE has reached the end of the line and there are still unused variables in var_list, the next line is read. When there are no unused variables left in var_list, the remainder of the line is ignored.

When reading into numeric variables, PV-WAVE attempts to convert the input into a value of the expected type. Decimal points are optional and scientific notation is allowed. If a real value is provided for an integer variable, the value is truncated at the decimal point.

NOTE: If the file contains string data, make sure the strings do not contain delimiter characters. Otherwise, the string will be interpreted as more than one string, and the data in the file will not match the variable list.

Once all variables in the variable list have been filled with data, DC_READ_FREE stops reading data, and returns a status code of zero (0). Even if an error occurs, and status is nonzero, the data that has been read successfully (prior to the error) is returned in the var_list variables.

TIP: If an error does occur, use the PRINT command to view the contents of the variables to see where the last successfully read value occurs. This will enable you to isolate the portion of the file in which the error occurred.

Physical Records vs. Logical Records

In an ASCII text file, the end-of-line is signified by the presence of either a CTRL-J or a CTRL-M character, and a record extends from one end-of-line character to the next. However, there are actually two kinds of records:

physical records
logical records

For column-oriented files, the amount of data in a physical record is often sufficient to provide exactly one value for each variable in var_list, and then it is a logical record, as well. For row-oriented files, the concept of logical records is not relevant, since data is merely read as contiguous values separated by delimiters, and the end-of-line is merely interpreted as another delimiter.

NOTE: The Nrecs keyword counts by logical records, if they have been defined. The Nskip keyword, on the other hand, counts by physical records, regardless of any logical record size that has been defined.

Changing the Logical Record Size

You can use the Vals_Per_Rec keyword to explicitly define a different logical record size, if you wish. However, in most cases, you do not need to provide this keyword. For an example of when to use the Vals_Per_Rec keyword, see

Example 4

NOTE: By default, DC_READ_FREE considers the physical record to be one line in the file, and the concept of a logical record is not needed. But if you are using logical records, the physical records in the file must all contain the same number of values. The Vals_Per_Rec keyword can be used only with column-oriented data files.

Filtering and Substitution While Reading Data

If you want certain characters filtered out of the data as it is read, use the Filters keyword to specify these characters. Each character (or sequence of digits that represents the ASCII code for a character) must be enclosed with single quotes. For example, either of the following is a valid specification:

',' or '44'

Furthermore, the two specifications shown above are equivalent to one another. For another example of using the Filters keyword, see

Example 4

TIP: Be sure not to filter characters that were used in the file as delimiters. The delimiters enable DC_READ_FREE to discern where one data value ends and another one begins.

Characters that match one of the values in Filters are treated as if they aren't even there; in other words, these characters are not treated as data and do not contribute to the size of the logical record, if one has been defined using the Vals_Per_Rec keyword.

NOTE: If you want to supply multi-character strings instead of individual characters, you can do this with the Ignore keyword. However, keep in mind that a character that matches Filters is simply discarded, and filtering resumes from that point, while a string that matches Ignore causes that entire line to be skipped.

So if you are reading a data file that contains a value such as #$*10.00**, but you don't want the entire line to be skipped, filter the characters individually with Filters = ['#', '$', '*'] instead of collectively with Ignore = ['#$*', '**'].

Missing Data Substitution

PV-WAVE expects to substitute a value from Miss_Vals whenever it encounters a string from Miss_Str in the data. Consequently, if the number of elements in Miss_Str does not match the number of elements in Miss_Vals, a nonzero status is returned and no data is read. The maximum number of values permitted in Miss_Str and Miss_Vals is 10.

If the end of the file is reached before all variables are filled with data, the remainder of each variable is set to Miss_Vals(0) if it was specified, or 0 (zero) if Miss_Vals was not specified. In this case, status is returned with a value less than zero to signify an unexpected end-of-file condition.

Delimiters in the Input File

Values in the file can be separated by commas, spaces, and any other delimiter characters specified with the Delim keyword. If you use any other delimiter, the delimiter character is treated as data and type conversion is attempted. If type conversion is not possible, status is set to less than zero to signify an error condition.

NOTE: Use a different delimiter to separate data values in the file than you use to separate the different fields of dates and times, such as months, days, hours, and minutes. Otherwise, your date/time data may not be interpreted correctly. The only delimiters that can be used inside date/time data are: slash ( / ), colon (:), hyphen (-), and comma (,).

Reading Row-Oriented Files

If you include the Row keyword, each variable in var_list is completely filled before any data is transferred to the next variable.

When reading row-oriented data, only the dimensionality of the last variable in var_list can be unknown; a variable of length n is created, where n is the number of values remaining in the file. All other variables in var_list must be pre-dimensioned.

If you include the Resize keyword with the call to the function DC_READ_FREE, the last variable can be redimensioned to match the actual number of values that were transferred to the variable during the read operation.

If you are interested in an illustration showing what row-oriented data can look like inside a file, see the PV-WAVE Programmer's Guide.

Reading Column-Oriented Files

If you include the Column keyword, DC_READ_FREE views the data files as a series of columns, with a one-to-one correspondence between columns in the file and variables in the variable list. In other words, one value from the first record of the file is transferred into each variable in var_list, then another value from the next record of the file is transferred into each variable in var_list, and so forth, until all the data in the file has been read, or until the variables are completely filled with data.

If a variable in var_list is undefined, a floating-point variable of length n is created, where n is the number of records read from the file. To get a similar effect in an existing variable, include the Resize keyword with the function call.

All variables specified with the Resize keyword are redimensioned to the same length the length of the longest column of data in the file. The variables that correspond to the shortest columns in the file will have one or more values added to the end; either Miss_Vals(0) if it was specified, or 0 (zero) if Miss_Vals was not specified.

If you are interested in an illustration demonstrating what column-oriented data can look like inside a file, see the PV-WAVE Programmer's Guide.

For more information about how column-oriented data in a file is read into multi-dimensional variables, see Multi-dimensional Variables.

Example 1

The data file shown below is a freely-formatted ASCII file named monotonic.dat:

1 2 3 4 5
 6 7 8 9 10
11 12 13 14 15
 16 17 18 19 20

The function call:

status = DC_READ_FREE('monotonic.dat', var1,  $
    /Column, Get_Columns=[3])

results in var1=[3.0, 8.0, 13.0, 18.0]. Because var1 was not predefined, DC_READ_FREE creates it as a resizable one-dimensional floating-point array.

On the other hand, the commands:

var1 = INTARR(2)
var2 = INTARR(2)
status = DC_READ_FREE('monotonic.dat', var1,  $
    var2, /Column, Get_Columns=[2, 4], Nskip=2)

result in var1=[12, 17] and var2=[14, 19].

Example 2

The data file shown below is a freely-formatted ASCII file named measure.dat:

0   5  10  15  20  25  30  35  40  45  50  56  61  66  71  
76  81  86  91

 96 101 107 112 117 122 127 132 137 142 147 152 158 163 168 173 
178 183 188

 193 198 203 209 214 219 224 229 234 239 244 249 255 255 255 255 
255 255 255

 255 255 255 255 255

The commands:

var1 = INTARR(5)
var2 = INTARR(5)
status = DC_READ_FREE('measure.dat',  $
    var1, var2, Ignore=["$BLANK_LINES"])

result in var1 = [0, 5, 10, 15, 20] and var2 = [25, 30, 35, 40, 45]. Note that the file was interpreted as row-oriented data, since neither the Row or Column keyword was specified. All totally blank lines are ignored

NOTE: If the Resize = [2] keyword had been provided, var2 would have been resizable and would have ended up having many more elements. Specifically, var2 would have ended up with 57 elements instead of just 5.

Example 3

The data file shown below is a freely-formatted ASCII file named intake.dat:

151-182-BADY-214-515
316-197-BADX-199-206

The commands:

valve = INTARR(30)
status = DC_READ_FREE('intake.dat',  $
    valve, Miss_Str=["BADX","BADY"],  $
    Miss_Vals=[9999, -9999], Resize=[1],  $
    Delim=['-'])

results in valve=[151, 182, -9999, 214, 515, 316, 197, 9999, 199, 206]. The hyphens in the data are filtered out. Because valve is resizable, it ends up with 10 elements instead of 30. The two values from Miss_Vals are substituted for the two strings in the file, "BADX" and "BADY".

Example 4

The data file shown below is a freely-formatted ASCII file named level.dat. This data file uses the semi-colon (;) and the slash (/) as delimiters, and the comma (,) to separate the thousands digit from the hundreds digit. This file has three logical records on every line; at the end of each logical record is a slash:

5,992;17,121/8,348;17,562/5,672;19,451/
5,459;18,659/7,088;17,052/8,541;13,437/
6,362;15,894/8,992;17,509/7,785;14,796/

The commands:

gap = INTARR(20)
bar = INTARR(20)
status = DC_READ_FREE('level.dat', gap, bar,  $
    /Column, Delim=[';', '/'], Filter=[','],  $
    Resize=[1, 2], Vals_Per_Rec=2)

result in:

gap = [5992, 8348, 5672, 5459, 7088, 8541,
6362, 8992, 7785] and bar = [17121, 17562,
19451, 18659, 17052, 13437, 15894, 17509,
14796].

The commas have been filtered out of the data because of the value of the string that was provided with the Filter keyword.

Suppose you wanted gap and bar to be dimensioned as 3-by-3 arrays instead of 1-by-9 vectors. The best way to do this is by reading the data with the commands shown above, and then using the REFORM command to redimension the variables:

gaparr = REFORM(gap, 3, 3)
bararr = REFORM(bar, 3, 3)

By approaching the data transfer in this way, DC_READ_FREE does not expect to transfer two columns of data into the same multi-dimensional variable.

For example, the following commands demonstrate the problem:

gap = INTARR(3, 3)
bar = INTARR(3, 3)
status = DC_READ_FREE('level.dat', gap, bar,  $
    /Column, Delim=[';', '/'], Filter=[','],  $
    Resize=[1, 2], Vals_Per_Rec=2)

results in:

The data is transferred into gap using the rule, "The first subscript varies fastest." With Vals_Per_Rec set to "2", no value is available for the third columnhence, every element in the third column is set equal to "0" (zero). Furthermore, notice that gap gets all the data (it is resizable) and bar gets none of the data.

Example 5

Assume that you have a file, events.dat, that contains some data values and also some chronological information about when those data values were recorded:

01/01/92 5:45:12 10 01-01-92 3276
02/01/92 10:10:10 15.89 06-15-91 99
05/15/91 2:02:02 14.2 12-25-92 876

The date/time templates that will be used to transfer this data have the following definitions:


Number	Template Description
1	MMDDYY (* = any delimiter)
-1	HHMMSS (* = any delimiter)

To read the date and time from the first two columns into one date/time variable and read the third column of floating point data into another variable, use the following commands:

date1 = REPLICATE({!DT},3)

; The system structure definition of date/time is !DT. Date/time 
; variables must be defined as !DT structure arrays before being 
; used if the date/time data is to be read as such.

status = DC_READ_FREE("events.dat", date1,  $
    date1, float1, /Column,  $
    Dt_Template=[1,-1], Delim=[' '])

; The variable date1 is listed twice; this way, both the date data 
; and the time data can be stored in the same variable, date1.

To see the values of the two variables, you can use the PRINT command:

FOR I = 0,2 DO BEGIN

PRINT, date1(I), float1(I)

; Print one row at a time.

ENDFOR

Executing these statements results in the following output:

{ 1992       01       01       05       45       12.00       87402.240       0} 
10.0000          { 1992       02       01       10       10       10.00       87433.424 
      0} 15.8900          { 1992       05       15       02       02       02.00 
      87537.035       0} 14.2000

Because date1 is a structure, curly braces, "{" and "}", are placed around the output. When displaying the value of date1 and float1, PV-WAVE uses default formats for formatting the values, and attempts to place as many items as possible onto each line.

TIP: Another alternative to view the contents of date1 and float1 is to use the DT_PRINT command instead of PRINT.

For more information about the internal organization of the !DT system structure, see the PV-WAVE Programmer's Guide.

To read the first, second, fourth, and fifth columns, define an integer array and another date/time variable, and change the call to DC_READ_FREE as shown below:

calib = INTARR(3)
date2 = REPLICATE({!DT},3)
status = DC_READ_FREE("events.dat", date1,  $
    date1, date2, calib, /Column, Delim=[' '], $
    Get_Columns= [1, 2, 4, 5], Dt_Template =  $
    [1, -1], Ignore=["$BAD_DATE_TIME"])

Notice how the date/time templates are reused. For each new record, Template 1 is used first to read the date data into date1. Next, Template -1 is used to read the time data into date1. Finally, since there is another date/time variable to be read (date2) and there are no more templates left, the template list is reset and Template 1 is used again. The template list is reset for each record.

NOTE: Because of the internal conversion that DC_READ_FIXED performs to convert the date strings to PV-WAVE's date/time internal structure, the date and time data must be read with the A8 (FORTRAN) or %8s (C) format string.

Normally an error would be reported if the input text to be read as date/time is invalid and cannot be converted. But because the Ignore=["$BAD_DATE_TIME"] keyword was provided, any record containing this type of error is ignored and no error is reported.

DC_READ_FREE Function

Usage

Input Parameters

Output Parameters

Returned Value

Keywords

Discussion

String Resources Used By This Function

How the Data is Transferred into Variables

Physical Records vs. Logical Records

Changing the Logical Record Size

Filtering and Substitution While Reading Data

Missing Data Substitution

Delimiters in the Input File

Reading Row-Oriented Files

Reading Column-Oriented Files

Example 1

Example 2

Example 3

Example 4

Example 5

See Also