Isis 3 Developer Reference
Isis::CSVReader Class Reference

Reads strings and parses them into tokens separated by a delimiter character. More...

#include <CSVReader.h>

Collaboration diagram for Isis::CSVReader:
Collaboration graph

Public Types

typedef Parser::TokenList CSVAxis
 Row/Column token list. More...
 
typedef TNT::Array1D< CSVAxisCSVTable
 Table of all rows/columns. More...
 
typedef CollectorMap< int, int > CSVColumnSummary
 Column summary for all rows. More...
 
typedef TNT::Array1D< double > CSVDblVector
 Double array def. More...
 
typedef TNT::Array1D< int > CSVIntVector
 Integer array def. More...
 

Public Member Functions

 CSVReader ()
 Default constructor for CSV reader. More...
 
 CSVReader (const QString &csvfile, bool header=false, int skip=0, const char &delimiter=',', const bool keepEmptyParts=true, const bool ignoreComments=true)
 constructor More...
 
virtual ~CSVReader ()
 Destructor (benign) More...
 
int size () const
 Reports the total number of lines read from the stream. More...
 
int rows () const
 Reports the number of rows in the table. More...
 
int columns () const
 Determine the number of columns in the input source. More...
 
int columns (const CSVTable &table) const
 Determine the number of columns in a parser CSV Table. More...
 
void setComment (const bool ignore=true)
 Allows the user to indicate comment disposition. More...
 
void setSkip (int nskip)
 Indicate the number of lines at the top of the source to skip to data. More...
 
int getSkip () const
 Reports the number of lines to skip. More...
 
bool haveHeader () const
 Returns true if a header is present in the input source. More...
 
void setHeader (const bool gotIt=true)
 Allows the user to indicate header disposition. More...
 
void setDelimiter (const char &delimiter)
 Set the delimiter character that separate tokens in the strings. More...
 
char getDelimiter () const
 Reports the character used to delimit tokens in strings. More...
 
void setKeepEmptyParts ()
 Indicate multiple occurances of delimiters are empty tokens. More...
 
void setSkipEmptyParts ()
 Indicate multiple occurances of delimiters are one token. More...
 
bool keepEmptyParts () const
 Returns true when preserving succesive tokens, false when they are treated as one token. More...
 
void read (const QString &fname)
 Reads the entire contents of a file for subsequent parsing. More...
 
CSVAxis getHeader () const
 Retrieve the header from the input source if it exists. More...
 
CSVAxis getRow (int index) const
 Parse and return the requested row by index. More...
 
CSVAxis getColumn (int index) const
 Parse and return a column specified by index order. More...
 
CSVAxis getColumn (const QString &hname) const
 Parse and return column specified by header name. More...
 
CSVTable getTable () const
 Parse and return all rows and columns in a table array. More...
 
bool isTableValid (const CSVTable &table) const
 Indicates if all rows have the same number of columns. More...
 
CSVColumnSummary getColumnSummary (const CSVTable &table) const
 Computes a row summary of the number of distinct columns in table. More...
 
template<typename T >
TNT::Array1D< T > convert (const CSVAxis &data) const
 Converts a row or column of data to the specified type. More...
 
void clear ()
 Discards all lines read from an input source. More...
 

Friends

std::istream & operator>> (std::istream &is, CSVReader &csv)
 Input read operator for input stream sources. More...
 

Detailed Description

Reads strings and parses them into tokens separated by a delimiter character.

The class will read text strings from an input source stream or file where each line (string) contains a single character delimeter that separates them into tokens. The input stream is text in nature and each line is terminated with a newline as appropriate for the computer system.

This class provides methods that support skipping irrelevant lines and recognizing and utlizing a header line. Tokens within a given line are separated by a single character. Consecutive delimiter characters can be treated as empty tokens (columns) or translated as a single token. Typically, consecutive tokens as empty strings is used for comma separated values (CSV) whereas space delimited strings oftentimes require multiple spaces to be treated as a single separator. This class supports both cases.

Comments can exist in a CSV and are indicated with '#' as the first character in the line. Default behavior (as of 2010/04/08) is to ignore these lines as well as blank lines. Use the setComment() method to alter this behavior. Also note that the skip lines count does not include comments or blank lines.

Each text line in the input source is read and stored in an internal stack. Only when explicitly requested does parsing take place - no parsing is performed during the reading of the input source. This approach allows the users of this class to alter or otherwise adjust parsing conditions after the input source has been internalized. This makes this implementation efficient and flexible deligating more control to the users of this class.

The mechanism in which parsed data is stored and returned to the callers enviroment makes this class efficient. The returned rows, columns and tables use memory reference counting. This allows parsed data to be exported with virtually no cost to the calling environment in terms of efficiency. It does however, lend itself to utilization issues. Reference counting means that all instances of a parsed row, column or table refer to the same copy of the data and a change in one instance of those elements is reflected in all instances of that same row. Note that this concern rests entirely on how the caller's environment utilizes returned data as only the original lines read from the input source are maintained internal to objects.

The following example demonstrates how to use this class to read a comma delimited file that may have consecutive commas and should be treated as empty columns. Furthermore, there are 2 lines to skip and a header line as well:

cout << "\n\nProcessing comma table...\n";
QString csvfile("comma.csv");
CSVReader csv(csvfile,true,2,','true);

Another way to ingest this file using methods instead of the constructor is as follows:

cout << "\n\nProcessing comma table using methods...\n";
QString csvfile("comma.csv");
csv.setSkip(2);
csv.setHeader(true);
csv.setDelimiter(',');
csv.setKeepEmptyParts();
csv.read(csvfile);

Using this method will always purge any previously read data from the CSVReader object.

Author
2006-08-14 Kris Becker

Member Typedef Documentation

◆ CSVAxis

Row/Column token list.

◆ CSVColumnSummary

Column summary for all rows.

◆ CSVDblVector

typedef TNT::Array1D<double> Isis::CSVReader::CSVDblVector

Double array def.

◆ CSVIntVector

typedef TNT::Array1D<int> Isis::CSVReader::CSVIntVector

Integer array def.

◆ CSVTable

typedef TNT::Array1D<CSVAxis> Isis::CSVReader::CSVTable

Table of all rows/columns.

Constructor & Destructor Documentation

◆ CSVReader() [1/2]

Isis::CSVReader::CSVReader ( )

Default constructor for CSV reader.

The default constructor sets up to read a source that has not header and skips no lines. It also sets the delimiter to the comma, as implied by its name (CSV = comma separated value), and treats multiple successive occurances of the delimiting character as individual tokens (keeping empty parts).

This method can be used when deferring the reading of the input source. Other methods available in this class can be used to adjust the behavior of the parsing before [i]and[/i] after reading of the source as parsing is performed on demand. This means a single input source can be parsed repeatedly after adjusting parameters.

◆ CSVReader() [2/2]

Isis::CSVReader::CSVReader ( const QString &  csvfile,
bool  header = false,
int  skip = 0,
const char &  delimiter = ',',
const bool  keepEmptyParts = true,
const bool  ignoreComments = true 
)

constructor

Parameterized constructor for parsing an input file source.

Parameters
ignoreCommentsboolean whether to ignore comments or not

This constructor can be used when the input source is an identified file. Parameters are available for specifying the parsing behavior, but are not necessarily required here as defaults are provided. Other methods in this class can set parsing conditions after the input file has been read in.

If the file cannot be opened or an error is encountered during the reading of the file, an Isis exception is thrown.

All lines are read in from the file and stored for subsequent parsing. Therefore, parsing can be performed at any time upon returning from this constructor.

Parameters
csvfileName of file to open and read
headerIndicates if a header exists (true) in the file or not (false)
skipNumber of lines to skip to header, if it exists, or to the first data line
delimiterIndicates the character to be used to delimit each token in the string/line
keepEmptyPartsIndicates successive delimiters are to be treated as empty tokens (true) or collapsed into one token (false)

References read().

◆ ~CSVReader()

virtual Isis::CSVReader::~CSVReader ( )
inlinevirtual

Destructor (benign)

Member Function Documentation

◆ clear()

void Isis::CSVReader::clear ( )
inline

Discards all lines read from an input source.

This method discards all lines read from any previous stream. Any subsequent row or column requests will return an empty condition.

◆ columns() [1/2]

int Isis::CSVReader::columns ( ) const

Determine the number of columns in the input source.

This method is applies the parsing conditions to all data lines to determine the number of columns. Note that it is assumed that all lines contain the same number of columns.

If the number of columns vary in any of the lines, the least number of columns found in all lines is returned due to the nature of how the columns are determined.

See also
isTableValid().

Note that this can be an expensive operation if the input source is large as all lines are parsed. This does not include the header.

See also
columns(const CSVReader::CSVTable &table) for an alternative and more efficient method. That method takes a previously parsed table of all lines as an argument, which is precisely how this method determines the columns.
Returns
int Number of columns in table, smallest column count if some lines are different
See also
getColumnSummary()

References getTable(), and rows().

Referenced by Isis::LoadCSV::load().

◆ columns() [2/2]

int Isis::CSVReader::columns ( const CSVTable table) const

Determine the number of columns in a parser CSV Table.

This method computes the number of columns from a CSVTable. This table is a result of the getTable method.

It is assumed each row in the table has the same number of columns after parsing. If one or more of the rows contain differing columns, only the smallest number of columns are reported.

Parameters
tableThe table from which the CVSTable rows are obtained
Returns
int Number of columns in table, smallest column count if some lines are different
See also
getColumnSummary()

References getColumnSummary(), Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::key(), and Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::size().

◆ convert()

template<typename T >
TNT::Array1D< T > Isis::CSVReader::convert ( const CSVAxis data) const

Converts a row or column of data to the specified type.

This method will convert a row or column of data to the specified type. Since this is a template method, it must be invoked explicity through template syntax. Here is an example to extract a column by a header name and convert it to a double precision array:

// Convert column 0/1 to double
CSVReader::CSVAxis scol = csv.getColumn("0/1");
CSVReader::CSVDblVector dcol = csv.convert<double>(scol);

At present, this class uses the Isis QString class as its token storage type (TokenType). All that is required is that it have a cast operator for a given type. If the Isis QString class has the operator, it can be invoked for that type. The precise statement used to convert the token to the explict type is:

out[i] = (T) s;

In this example, s is the individual token and T is the type double as in the previous example.

Note that conversions of specific special pixel values is not inherently handled by this method. If you anticipate textual representations of special pixels, such as NULL, LIS etc..., this is left up to the caller to handle directly.

Parameters
dataInput row or column
Returns
TNT::Array1D<T> Converted data array of specified type

References Isis::toDouble().

◆ getColumn() [1/2]

CSVReader::CSVAxis Isis::CSVReader::getColumn ( int  index) const

Parse and return a column specified by index order.

This method extracts a column from each row and returns the result. Note that parsing rules are applied to each row and the column at index is extracted and returned in the array. The array is always the number of rows from the input source (less skipped lines and header if they exist).

It is assumed that every row has the same number of columns (

See also
isTableValid()) but in the event that the requested column does not exist for any (or all rows for that matter) a default constructed token is returned for that row. If the requested index is less than 0, an empty column is returned.

Columns are 0-based index so the valid number of columns range 0 to (columns() - 1).

Parameters
indexZero-based column index to parse and return
Returns
CSVReader::CSVAxis Array of token element from each column

References Isis::CSVParser< TokenStore >::parse(), rows(), and Isis::CSVParser< TokenStore >::size().

Referenced by getColumn(), and Isis::LoadCSV::load().

◆ getColumn() [2/2]

CSVReader::CSVAxis Isis::CSVReader::getColumn ( const QString &  hname) const

Parse and return column specified by header name.

This method will parse and extract a column that corresponds to named column in the header. This method return a zero-length array if a header does not exist for this input source or the named column does not exist.

The header is parsed using the same rules as each row. It is the responsibility of the user of this class to specify the existance of a header. Once the header is parsed, a case-insensitive search of the names is performed until the requested column name is found. The index of this header name is then used to extract the column from each row.

It is assumed the column exists in each row. If it does not, a default constructed token is returned for non-existant columns in a row.

Parameters
hnameName of the column as it exists in the header
Returns
CSVReader::CSVAxis Column array parsed from each row

References getColumn(), and getHeader().

◆ getColumnSummary()

CSVReader::CSVColumnSummary Isis::CSVReader::getColumnSummary ( const CSVTable table) const

Computes a row summary of the number of distinct columns in table.

A CSVColumnSummary is a CollectorMap where the key is the number of columns and the value is the number of rows that contain that number of columns. This is useful to determine the consistancy of a parser input source such that every row contains the same number of columns.

Once this summary is computed, there should exist one and only ome element in the summary where the key is the column count for each row and the value of that key is the number of rows that contain those columns.

This example shows how to determine this information:

CSVReader::CSVTable table = csv.getTable();
CSVReader::CSVColumnSummary summary = csv.getColumnSummary(table);
cout << "Number of columns: " << csv.columns(table) << endl;
cout << "Number distinct columns: " << summary.size() << endl;
for (int ncols = 0 ; ncols < summary.size() ; ncols++) {
cout << "--> " << summary.getNth(ncols) << " rows have "
<< summary.key(ncols) << " columns." << endl;
}
Parameters
tableInput table as returned by the getTable method
Returns
CSVReader::CSVColumnSummary A CollectorMap that idicates the number of rows with distinct numbers of columns
See also
getTable()
isTableValid()

References Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::add(), Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::exists(), and Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::get().

Referenced by columns(), and isTableValid().

◆ getDelimiter()

char Isis::CSVReader::getDelimiter ( ) const
inline

Reports the character used to delimit tokens in strings.

Returns
char Current character used to delimit tokens

◆ getHeader()

CSVReader::CSVAxis Isis::CSVReader::getHeader ( ) const

Retrieve the header from the input source if it exists.

This method will return the header if it exists after appling the parsing rules.

The existance of the header is determined entirely by the user of this class. If the header does not exist, a zero-length array is returned.

Note that this routine does not trim leading or trailing whitespace from each header. This must be handled by the caller.

Returns
CSVReader::CSVAxis Array containing the elements of the header
See also
haveHeader()
setHeader()

References rows().

Referenced by getColumn(), and Isis::LoadCSV::load().

◆ getRow()

CSVReader::CSVAxis Isis::CSVReader::getRow ( int  index) const

Parse and return the requested row by index.

This method will parse and return the requested row from the input source as an array. If the requested row is determined to be an invalid index, then a zero-length array is returned. It is up to the caller to check for validity of the returned row array.

Parameters
indexIndex of the desired row to return
Returns
CSVReader::CSVAxis Array of tokens after parsing rules are applied

References rows().

Referenced by Isis::LoadCSV::load().

◆ getSkip()

int Isis::CSVReader::getSkip ( ) const
inline

Reports the number of lines to skip.

This is the number of lines to skip to get to the header, if one exists, or to the first row of data to parse.

Returns
int Number of lines to skip

◆ getTable()

CSVReader::CSVTable Isis::CSVReader::getTable ( ) const

Parse and return all rows and columns in a table array.

This method returns a 2-D table of all rows and columns after parsing rules are applied. Each column or token in each row is returned as a CSVParser::TokenType. Subsequent conversion can be performed if the type sufficiently supports it or the user can provide its own conversion techniques.

The validity of the table with regards to column integrity (same number of columns in each row) can be checked with the isTableValid method. A summary of the number of rows containing differing numbers of columns is provided by the getColumnSummary method.

The returned table does not include the header row or any skipped rows. An empty table, zero-length array is returned if no rows are present.

The table itself is a 1-dimenional array that contains a row at each element. This conceptually is a 2-dimensional table. Each element in the row (first) dimension of the table is a CSVAxis array containing parsed columns or tokens. Note that the number of columns may vary from row to row.

Returns
CSVReader::CSVTable 2-D table of parsed columns in each row

References Isis::CSVParser< TokenStore >::parse(), Isis::CSVParser< TokenStore >::result(), and rows().

Referenced by columns().

◆ haveHeader()

bool Isis::CSVReader::haveHeader ( ) const
inline

Returns true if a header is present in the input source.

The existance of a header line is always determined by the user of this class. See the setHeader() method for additional information on header maintainence.

Returns
bool whether has CSV has header

◆ isTableValid()

bool Isis::CSVReader::isTableValid ( const CSVTable table) const

Indicates if all rows have the same number of columns.

This method checks the integrity of all rows in the inputs source as to whether they have the same number of columns.

Parameters
tableInput table to check for integrity/validty
Returns
bool True if all rows have the same number of columns, false if they do not

References getColumnSummary(), and Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::size().

◆ keepEmptyParts()

bool Isis::CSVReader::keepEmptyParts ( ) const
inline

Returns true when preserving succesive tokens, false when they are treated as one token.

See also
setKeepEmptyParts()
setSkipEmptyParts()
Returns
bool

◆ read()

void Isis::CSVReader::read ( const QString &  csvfile)

Reads the entire contents of a file for subsequent parsing.

This method opens the specified file and reads every line storing them in this object. It is assumed this file is a text file. Other methods in this class can be utilized to set parsing conditions before [i]or[/i] after the file has been read.

Note that parsing the file is deferred until explicity invoked through other methods in this class. Users of this class can extract individual rows, columns or the complete table.

This object is reentrant. Additional files can be read in. Any existing data from previous input sources is discarded upon subsequent reads.

Parameters
csvfileName of file to read

References _FILEINFO_, and Isis::IException::User.

Referenced by CSVReader(), and Isis::LoadCSV::load().

◆ rows()

int Isis::CSVReader::rows ( ) const
inline

Reports the number of rows in the table.

This method returns only the number of rows of data. This count does not include skipped lines or the header line if either exists. Note that if no lines are skipped and no header exists, this count will be identical to size().

Returns
int Number of rows of data from the input source

Referenced by columns(), getColumn(), getHeader(), getRow(), getTable(), and Isis::LoadCSV::load().

◆ setComment()

void Isis::CSVReader::setComment ( const bool  ignore = true)
inline

Allows the user to indicate comment disposition.

Comments are indicated in a CSV file by a '#' sign in the first column. If they are present, the default is to ignore them and discard them when they are read in. This method allows the user to specify how to treat lines that begin with a '#' in the off chance they are part of the good stuff.

Comment lines are not part of the skip lines parameter unless this is set to false. Then skip lines will include lines that start with a '#' if they exist.

Also not that any and all blanl/empty lines are discarded and not included in any count - includig the skip line count.

Parameters
ignoreTrue indicates lines that start with a '#' are considered a comment and are discarded. False will not discard these lines but include them in the parsing content.

Referenced by Isis::LoadCSV::load().

◆ setDelimiter()

void Isis::CSVReader::setDelimiter ( const char &  delimiter)
inline

Set the delimiter character that separate tokens in the strings.

This method provides the user of this class to indicate the character that separates individual tokens in each row, including the header line.

One must ensure the delimiter character is not within tokens (such as comma delimited strings) or incorrect parsing will occur.

Parameters
delimiterSingle character that delimits tokens in each string

Referenced by Isis::LoadCSV::load().

◆ setHeader()

void Isis::CSVReader::setHeader ( const bool  gotIt = true)
inline

Allows the user to indicate header disposition.

The determination of a header is entirely up to the user of this class. If a header exists, the user must indicate this with a true parameter to this method. That line is excluded from the row-by-row and column data parsing operations. If no header exists, provide false to this method.

It is assumed that headers exist immediately prior to data rows and any skipped lines preceed the header line. Only one line is presumed to be a header.

Note that this method can be set at any time in the process of reading from a file or stream source as parsing is done on demand and not at the time the source is read in.

Parameters
gotItTrue indicates the presence of a header, false indicates one does not exist.

Referenced by Isis::LoadCSV::load().

◆ setKeepEmptyParts()

void Isis::CSVReader::setKeepEmptyParts ( )
inline

Indicate multiple occurances of delimiters are empty tokens.

Use of this method indicates that when multiple instances of the delimiting character occure in succession, they should be treated as empty tokens. This is useful when input sources truly have empty fields.

◆ setSkip()

void Isis::CSVReader::setSkip ( int  nskip)
inline

Indicate the number of lines at the top of the source to skip to data.

This method allows the user to indicate the number of lines that are to be ignored at the begining of the input source. These lines may contain any text, but are persistantly ignored for all row and column parsing operations.

Note that this should not include a header line if one exists as the header methods maintain that information for parsing operations. It is assumed that header lines always follow skipped lines and immediately precede data lines.

This count does not include comments lines (first character is a '#'), if they are ignored (default) or blank lines.

Parameters
nskipNumber of lines to skip

Referenced by Isis::LoadCSV::load().

◆ setSkipEmptyParts()

void Isis::CSVReader::setSkipEmptyParts ( )
inline

Indicate multiple occurances of delimiters are one token.

Use of this method indicates that when multiple instances of the delimiting character occurs in succession, they should be treated as a single token. This is useful when input sources have space separated tokens. Frequently, there are many spaces between values when spaces are used as the delimiting character. Call this method when spaces are used as token delimiters.

Referenced by Isis::LoadCSV::load().

◆ size()

int Isis::CSVReader::size ( ) const
inline

Reports the total number of lines read from the stream.

Returns
int Number of lines read from input source

Friends And Related Function Documentation

◆ operator>>

std::istream& operator>> ( std::istream &  is,
CSVReader csv 
)
friend

Input read operator for input stream sources.

This input operator can be invoked directly from the users environment to read the complete input source. It can also be used to augment an existing source as this technique does not discard existing data (lines).

It is presumed that any additional input sources are consistant to pre-established parsing guidelines otherwise, the integrity of the table is compromized.

Here is an example of how to use this method:

ifstream ifile("myfile.csv");
ifile >> csv;
Parameters
isInput stream source
csvCSVReader object to read input source lines from
Returns
std::istream& Returns the state of the input stream at EOF or error

The documentation for this class was generated from the following files: