Isis 3 Programmer Reference
|
Reads strings and parses them into tokens separated by a delimiter character. More...
#include <CSVReader.h>
Public Types | |
typedef Parser::TokenList | CSVAxis |
Row/Column token list. More... | |
typedef TNT::Array1D< CSVAxis > | CSVTable |
Table of all rows/columns. More... | |
typedef CollectorMap< int, int > | CSVColumnSummary |
Column summary for all rows. More... | |
typedef TNT::Array1D< double > | CSVDblVector |
Double array def. More... | |
typedef TNT::Array1D< int > | CSVIntVector |
Integer array def. More... | |
Public Member Functions | |
CSVReader () | |
Default constructor for CSV reader. More... | |
CSVReader (const QString &csvfile, bool header=false, int skip=0, const char &delimiter=',', const bool keepEmptyParts=true, const bool ignoreComments=true) | |
constructor More... | |
virtual | ~CSVReader () |
Destructor (benign) More... | |
int | size () const |
Reports the total number of lines read from the stream. More... | |
int | rows () const |
Reports the number of rows in the table. More... | |
int | columns () const |
Determine the number of columns in the input source. More... | |
int | columns (const CSVTable &table) const |
Determine the number of columns in a parser CSV Table. More... | |
void | setComment (const bool ignore=true) |
Allows the user to indicate comment disposition. More... | |
void | setSkip (int nskip) |
Indicate the number of lines at the top of the source to skip to data. More... | |
int | getSkip () const |
Reports the number of lines to skip. More... | |
bool | haveHeader () const |
Returns true if a header is present in the input source. More... | |
void | setHeader (const bool gotIt=true) |
Allows the user to indicate header disposition. More... | |
void | setDelimiter (const char &delimiter) |
Set the delimiter character that separate tokens in the strings. More... | |
char | getDelimiter () const |
Reports the character used to delimit tokens in strings. More... | |
void | setKeepEmptyParts () |
Indicate multiple occurances of delimiters are empty tokens. More... | |
void | setSkipEmptyParts () |
Indicate multiple occurances of delimiters are one token. More... | |
bool | keepEmptyParts () const |
Returns true when preserving succesive tokens, false when they are treated as one token. More... | |
void | read (const QString &fname) |
Reads the entire contents of a file for subsequent parsing. More... | |
CSVAxis | getHeader () const |
Retrieve the header from the input source if it exists. More... | |
CSVAxis | getRow (int index) const |
Parse and return the requested row by index. More... | |
CSVAxis | getColumn (int index) const |
Parse and return a column specified by index order. More... | |
CSVAxis | getColumn (const QString &hname) const |
Parse and return column specified by header name. More... | |
CSVTable | getTable () const |
Parse and return all rows and columns in a table array. More... | |
bool | isTableValid (const CSVTable &table) const |
Indicates if all rows have the same number of columns. More... | |
CSVColumnSummary | getColumnSummary (const CSVTable &table) const |
Computes a row summary of the number of distinct columns in table. More... | |
template<typename T > | |
TNT::Array1D< T > | convert (const CSVAxis &data) const |
Converts a row or column of data to the specified type. More... | |
void | clear () |
Discards all lines read from an input source. More... | |
Private Types | |
typedef CSVParser< QString > | Parser |
Defines single line parser. More... | |
typedef std::vector< QString > | CSVList |
Input source line container. More... | |
Private Member Functions | |
int | firstRowIndex () const |
Computes the index of the first data. More... | |
std::istream & | load (std::istream &ifile) |
Reads all lines from the input stream until an EOF is encoutered. More... | |
Private Attributes | |
bool | _header |
Indicates presences of header. More... | |
int | _skip |
Number of lines to skip. More... | |
char | _delimiter |
Separator of values. More... | |
bool | _keepParts |
Keep empty parts between delimiter. More... | |
CSVList | _lines |
List of lines from file. More... | |
bool | _ignoreComments |
Ignore comments on read. More... | |
Friends | |
std::istream & | operator>> (std::istream &is, CSVReader &csv) |
Input read operator for input stream sources. More... | |
Reads strings and parses them into tokens separated by a delimiter character.
The class will read text strings from an input source stream or file where each line (string) contains a single character delimeter that separates them into tokens. The input stream is text in nature and each line is terminated with a newline as appropriate for the computer system.
This class provides methods that support skipping irrelevant lines and recognizing and utlizing a header line. Tokens within a given line are separated by a single character. Consecutive delimiter characters can be treated as empty tokens (columns) or translated as a single token. Typically, consecutive tokens as empty strings is used for comma separated values (CSV) whereas space delimited strings oftentimes require multiple spaces to be treated as a single separator. This class supports both cases.
Comments can exist in a CSV and are indicated with '#' as the first character in the line. Default behavior (as of 2010/04/08) is to ignore these lines as well as blank lines. Use the setComment() method to alter this behavior. Also note that the skip lines count does not include comments or blank lines.
Each text line in the input source is read and stored in an internal stack. Only when explicitly requested does parsing take place - no parsing is performed during the reading of the input source. This approach allows the users of this class to alter or otherwise adjust parsing conditions after the input source has been internalized. This makes this implementation efficient and flexible deligating more control to the users of this class.
The mechanism in which parsed data is stored and returned to the callers enviroment makes this class efficient. The returned rows, columns and tables use memory reference counting. This allows parsed data to be exported with virtually no cost to the calling environment in terms of efficiency. It does however, lend itself to utilization issues. Reference counting means that all instances of a parsed row, column or table refer to the same copy of the data and a change in one instance of those elements is reflected in all instances of that same row. Note that this concern rests entirely on how the caller's environment utilizes returned data as only the original lines read from the input source are maintained internal to objects.
The following example demonstrates how to use this class to read a comma delimited file that may have consecutive commas and should be treated as empty columns. Furthermore, there are 2 lines to skip and a header line as well:
Another way to ingest this file using methods instead of the constructor is as follows:
Using this method will always purge any previously read data from the CSVReader object.
2008-06-18 Christopher Austin - Fixed documentation
2010-04-08 Kris Becker - Added discarding of comment and blank lines
Definition at line 255 of file CSVReader.h.
Row/Column token list.
Definition at line 263 of file CSVReader.h.
typedef CollectorMap<int, int> Isis::CSVReader::CSVColumnSummary |
Column summary for all rows.
Definition at line 265 of file CSVReader.h.
typedef TNT::Array1D<double> Isis::CSVReader::CSVDblVector |
Double array def.
Definition at line 267 of file CSVReader.h.
typedef TNT::Array1D<int> Isis::CSVReader::CSVIntVector |
Integer array def.
Definition at line 268 of file CSVReader.h.
|
private |
Input source line container.
Definition at line 489 of file CSVReader.h.
typedef TNT::Array1D<CSVAxis> Isis::CSVReader::CSVTable |
Table of all rows/columns.
Definition at line 264 of file CSVReader.h.
|
private |
Defines single line parser.
Definition at line 258 of file CSVReader.h.
Isis::CSVReader::CSVReader | ( | ) |
Default constructor for CSV reader.
The default constructor sets up to read a source that has not header and skips no lines. It also sets the delimiter to the comma, as implied by its name (CSV = comma separated value), and treats multiple successive occurances of the delimiting character as individual tokens (keeping empty parts).
This method can be used when deferring the reading of the input source. Other methods available in this class can be used to adjust the behavior of the parsing before [i]and[/i] after reading of the source as parsing is performed on demand. This means a single input source can be parsed repeatedly after adjusting parameters.
Definition at line 51 of file CSVReader.cpp.
Isis::CSVReader::CSVReader | ( | const QString & | csvfile, |
bool | header = false , |
||
int | skip = 0 , |
||
const char & | delimiter = ',' , |
||
const bool | keepEmptyParts = true , |
||
const bool | ignoreComments = true |
||
) |
constructor
Parameterized constructor for parsing an input file source.
ignoreComments | boolean whether to ignore comments or not |
This constructor can be used when the input source is an identified file. Parameters are available for specifying the parsing behavior, but are not necessarily required here as defaults are provided. Other methods in this class can set parsing conditions after the input file has been read in.
If the file cannot be opened or an error is encountered during the reading of the file, an Isis exception is thrown.
All lines are read in from the file and stored for subsequent parsing. Therefore, parsing can be performed at any time upon returning from this constructor.
csvfile | Name of file to open and read |
header | Indicates if a header exists (true) in the file or not (false) |
skip | Number of lines to skip to header, if it exists, or to the first data line |
delimiter | Indicates the character to be used to delimit each token in the string/line |
keepEmptyParts | Indicates successive delimiters are to be treated as empty tokens (true) or collapsed into one token (false) |
Definition at line 81 of file CSVReader.cpp.
References read().
|
inlinevirtual |
Destructor (benign)
Definition at line 281 of file CSVReader.h.
|
inline |
Discards all lines read from an input source.
This method discards all lines read from any previous stream. Any subsequent row or column requests will return an empty condition.
Definition at line 484 of file CSVReader.h.
References _lines.
int Isis::CSVReader::columns | ( | ) | const |
Determine the number of columns in the input source.
This method is applies the parsing conditions to all data lines to determine the number of columns. Note that it is assumed that all lines contain the same number of columns.
If the number of columns vary in any of the lines, the least number of columns found in all lines is returned due to the nature of how the columns are determined.
Note that this can be an expensive operation if the input source is large as all lines are parsed. This does not include the header.
Definition at line 113 of file CSVReader.cpp.
References getTable(), and rows().
int Isis::CSVReader::columns | ( | const CSVTable & | table | ) | const |
Determine the number of columns in a parser CSV Table.
This method computes the number of columns from a CSVTable. This table is a result of the getTable method.
It is assumed each row in the table has the same number of columns after parsing. If one or more of the rows contain differing columns, only the smallest number of columns are reported.
table | The table from which the CVSTable rows are obtained |
Definition at line 133 of file CSVReader.cpp.
References getColumnSummary(), Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::key(), and Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::size().
TNT::Array1D< T > Isis::CSVReader::convert | ( | const CSVAxis & | data | ) | const |
Converts a row or column of data to the specified type.
This method will convert a row or column of data to the specified type. Since this is a template method, it must be invoked explicity through template syntax. Here is an example to extract a column by a header name and convert it to a double precision array:
At present, this class uses the Isis QString class as its token storage type (TokenType). All that is required is that it have a cast operator for a given type. If the Isis QString class has the operator, it can be invoked for that type. The precise statement used to convert the token to the explict type is:
In this example, s is the individual token and T is the type double as in the previous example.
Note that conversions of specific special pixel values is not inherently handled by this method. If you anticipate textual representations of special pixels, such as NULL, LIS etc..., this is left up to the caller to handle directly.
data | Input row or column |
Definition at line 548 of file CSVReader.h.
References Isis::toDouble().
|
inlineprivate |
Computes the index of the first data.
This convenience method computes the index of the first data row considering the number of lines to skip and the existance of a header line.
Definition at line 506 of file CSVReader.h.
References _header, and _skip.
Referenced by getColumn(), getRow(), getTable(), and rows().
CSVReader::CSVAxis Isis::CSVReader::getColumn | ( | int | index | ) | const |
Parse and return a column specified by index order.
This method extracts a column from each row and returns the result. Note that parsing rules are applied to each row and the column at index is extracted and returned in the array. The array is always the number of rows from the input source (less skipped lines and header if they exist).
It is assumed that every row has the same number of columns (
Columns are 0-based index so the valid number of columns range 0 to (columns() - 1).
index | Zero-based column index to parse and return |
Definition at line 234 of file CSVReader.cpp.
References _delimiter, _keepParts, _lines, firstRowIndex(), Isis::CSVParser< TokenStore >::parse(), rows(), and Isis::CSVParser< TokenStore >::size().
Referenced by getColumn().
CSVReader::CSVAxis Isis::CSVReader::getColumn | ( | const QString & | hname | ) | const |
Parse and return column specified by header name.
This method will parse and extract a column that corresponds to named column in the header. This method return a zero-length array if a header does not exist for this input source or the named column does not exist.
The header is parsed using the same rules as each row. It is the responsibility of the user of this class to specify the existance of a header. Once the header is parsed, a case-insensitive search of the names is performed until the requested column name is found. The index of this header name is then used to extract the column from each row.
It is assumed the column exists in each row. If it does not, a default constructed token is returned for non-existant columns in a row.
hname | Name of the column as it exists in the header |
Definition at line 279 of file CSVReader.cpp.
References getColumn(), and getHeader().
CSVReader::CSVColumnSummary Isis::CSVReader::getColumnSummary | ( | const CSVTable & | table | ) | const |
Computes a row summary of the number of distinct columns in table.
A CSVColumnSummary is a CollectorMap where the key is the number of columns and the value is the number of rows that contain that number of columns. This is useful to determine the consistancy of a parser input source such that every row contains the same number of columns.
Once this summary is computed, there should exist one and only ome element in the summary where the key is the column count for each row and the value of that key is the number of rows that contain those columns.
This example shows how to determine this information:
table | Input table as returned by the getTable method |
Definition at line 360 of file CSVReader.cpp.
References Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::add(), Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::exists(), and Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::get().
Referenced by columns(), and isTableValid().
|
inline |
Reports the character used to delimit tokens in strings.
Definition at line 425 of file CSVReader.h.
References _delimiter.
CSVReader::CSVAxis Isis::CSVReader::getHeader | ( | ) | const |
Retrieve the header from the input source if it exists.
This method will return the header if it exists after appling the parsing rules.
The existance of the header is determined entirely by the user of this class. If the header does not exist, a zero-length array is returned.
Note that this routine does not trim leading or trailing whitespace from each header. This must be handled by the caller.
Definition at line 184 of file CSVReader.cpp.
References _delimiter, _header, _keepParts, _lines, _skip, and rows().
Referenced by getColumn().
CSVReader::CSVAxis Isis::CSVReader::getRow | ( | int | index | ) | const |
Parse and return the requested row by index.
This method will parse and return the requested row from the input source as an array. If the requested row is determined to be an invalid index, then a zero-length array is returned. It is up to the caller to check for validity of the returned row array.
index | Index of the desired row to return |
Definition at line 204 of file CSVReader.cpp.
References _delimiter, _keepParts, _lines, firstRowIndex(), and rows().
|
inline |
Reports the number of lines to skip.
This is the number of lines to skip to get to the header, if one exists, or to the first row of data to parse.
Definition at line 363 of file CSVReader.h.
References _skip.
CSVReader::CSVTable Isis::CSVReader::getTable | ( | ) | const |
Parse and return all rows and columns in a table array.
This method returns a 2-D table of all rows and columns after parsing rules are applied. Each column or token in each row is returned as a CSVParser::TokenType. Subsequent conversion can be performed if the type sufficiently supports it or the user can provide its own conversion techniques.
The validity of the table with regards to column integrity (same number of columns in each row) can be checked with the isTableValid method. A summary of the number of rows containing differing numbers of columns is provided by the getColumnSummary method.
The returned table does not include the header row or any skipped rows. An empty table, zero-length array is returned if no rows are present.
The table itself is a 1-dimenional array that contains a row at each element. This conceptually is a 2-dimensional table. Each element in the row (first) dimension of the table is a CSVAxis array containing parsed columns or tokens. Note that the number of columns may vary from row to row.
Definition at line 318 of file CSVReader.cpp.
References _delimiter, _keepParts, _lines, firstRowIndex(), Isis::CSVParser< TokenStore >::parse(), Isis::CSVParser< TokenStore >::result(), and rows().
Referenced by columns().
|
inline |
Returns true if a header is present in the input source.
The existance of a header line is always determined by the user of this class. See the setHeader() method for additional information on header maintainence.
Definition at line 375 of file CSVReader.h.
References _header.
bool Isis::CSVReader::isTableValid | ( | const CSVTable & | table | ) | const |
Indicates if all rows have the same number of columns.
This method checks the integrity of all rows in the inputs source as to whether they have the same number of columns.
table | Input table to check for integrity/validty |
Definition at line 388 of file CSVReader.cpp.
References getColumnSummary(), and Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::size().
|
inline |
Returns true when preserving succesive tokens, false when they are treated as one token.
Definition at line 461 of file CSVReader.h.
References _keepParts.
|
private |
Reads all lines from the input stream until an EOF is encoutered.
This method is the used to read from an input stream all lines of text until an end-of-file (EOF) is encountered. It is used to perform read operations for all sources of input, files and direct streams as supplied by the users of this class.
All lines are assumed to end with a newline sequence pertinent to the systems this software is compiled on. All lines are stored as they are read in unless they are empty lines. The default behavior is to treat all lines that begin with a '#' as a comment. These lines are ignored by default and excluded as they are read. (Comment and blank line feature was added 2010/04/08.)
As lines are read in from the input stream, they are pushed onto the internal stack in the order they are read. The calling environment is responsible for the state of the stack as to whether it is cleared or appended to an existing state.
ifile | Input source stream of lines of text |
Definition at line 418 of file CSVReader.cpp.
References _FILEINFO_, _ignoreComments, _lines, and Isis::IException::User.
Referenced by Isis::operator>>(), and read().
void Isis::CSVReader::read | ( | const QString & | csvfile | ) |
Reads the entire contents of a file for subsequent parsing.
This method opens the specified file and reads every line storing them in this object. It is assumed this file is a text file. Other methods in this class can be utilized to set parsing conditions before [i]or[/i] after the file has been read.
Note that parsing the file is deferred until explicity invoked through other methods in this class. Users of this class can extract individual rows, columns or the complete table.
This object is reentrant. Additional files can be read in. Any existing data from previous input sources is discarded upon subsequent reads.
csvfile | Name of file to read |
Definition at line 156 of file CSVReader.cpp.
References _FILEINFO_, _lines, load(), and Isis::IException::User.
Referenced by CSVReader().
|
inline |
Reports the number of rows in the table.
This method returns only the number of rows of data. This count does not include skipped lines or the header line if either exists. Note that if no lines are skipped and no header exists, this count will be identical to size().
Definition at line 301 of file CSVReader.h.
References _lines, and firstRowIndex().
Referenced by columns(), getColumn(), getHeader(), getRow(), and getTable().
|
inline |
Allows the user to indicate comment disposition.
Comments are indicated in a CSV file by a '#' sign in the first column. If they are present, the default is to ignore them and discard them when they are read in. This method allows the user to specify how to treat lines that begin with a '#' in the off chance they are part of the good stuff.
Comment lines are not part of the skip lines parameter unless this is set to false. Then skip lines will include lines that start with a '#' if they exist.
Also not that any and all blanl/empty lines are discarded and not included in any count - includig the skip line count.
ignore | True indicates lines that start with a '#' are considered a comment and are discarded. False will not discard these lines but include them in the parsing content. |
Definition at line 329 of file CSVReader.h.
References _ignoreComments.
|
inline |
Set the delimiter character that separate tokens in the strings.
This method provides the user of this class to indicate the character that separates individual tokens in each row, including the header line.
One must ensure the delimiter character is not within tokens (such as comma delimited strings) or incorrect parsing will occur.
delimiter | Single character that delimits tokens in each string |
Definition at line 416 of file CSVReader.h.
References _delimiter.
|
inline |
Allows the user to indicate header disposition.
The determination of a header is entirely up to the user of this class. If a header exists, the user must indicate this with a true parameter to this method. That line is excluded from the row-by-row and column data parsing operations. If no header exists, provide false to this method.
It is assumed that headers exist immediately prior to data rows and any skipped lines preceed the header line. Only one line is presumed to be a header.
Note that this method can be set at any time in the process of reading from a file or stream source as parsing is done on demand and not at the time the source is read in.
gotIt | True indicates the presence of a header, false indicates one does not exist. |
Definition at line 399 of file CSVReader.h.
References _header.
|
inline |
Indicate multiple occurances of delimiters are empty tokens.
Use of this method indicates that when multiple instances of the delimiting character occure in succession, they should be treated as empty tokens. This is useful when input sources truly have empty fields.
Definition at line 437 of file CSVReader.h.
References _keepParts.
|
inline |
Indicate the number of lines at the top of the source to skip to data.
This method allows the user to indicate the number of lines that are to be ignored at the begining of the input source. These lines may contain any text, but are persistantly ignored for all row and column parsing operations.
Note that this should not include a header line if one exists as the header methods maintain that information for parsing operations. It is assumed that header lines always follow skipped lines and immediately precede data lines.
This count does not include comments lines (first character is a '#'), if they are ignored (default) or blank lines.
nskip | Number of lines to skip |
Definition at line 351 of file CSVReader.h.
References _skip.
|
inline |
Indicate multiple occurances of delimiters are one token.
Use of this method indicates that when multiple instances of the delimiting character occurs in succession, they should be treated as a single token. This is useful when input sources have space separated tokens. Frequently, there are many spaces between values when spaces are used as the delimiting character. Call this method when spaces are used as token delimiters.
Definition at line 451 of file CSVReader.h.
References _keepParts.
|
inline |
Reports the total number of lines read from the stream.
Definition at line 287 of file CSVReader.h.
References _lines.
|
friend |
Input read operator for input stream sources.
This input operator can be invoked directly from the users environment to read the complete input source. It can also be used to augment an existing source as this technique does not discard existing data (lines).
It is presumed that any additional input sources are consistant to pre-established parsing guidelines otherwise, the integrity of the table is compromized.
Here is an example of how to use this method:
is | Input stream source |
csv | CSVReader object to read input source lines from |
Definition at line 463 of file CSVReader.cpp.
|
private |
Separator of values.
Definition at line 492 of file CSVReader.h.
Referenced by getColumn(), getDelimiter(), getHeader(), getRow(), getTable(), and setDelimiter().
|
private |
Indicates presences of header.
Definition at line 490 of file CSVReader.h.
Referenced by firstRowIndex(), getHeader(), haveHeader(), and setHeader().
|
private |
Ignore comments on read.
Definition at line 495 of file CSVReader.h.
Referenced by load(), and setComment().
|
private |
Keep empty parts between delimiter.
Definition at line 493 of file CSVReader.h.
Referenced by getColumn(), getHeader(), getRow(), getTable(), keepEmptyParts(), setKeepEmptyParts(), and setSkipEmptyParts().
|
private |
List of lines from file.
Definition at line 494 of file CSVReader.h.
Referenced by clear(), getColumn(), getHeader(), getRow(), getTable(), load(), read(), rows(), and size().
|
private |
Number of lines to skip.
Definition at line 491 of file CSVReader.h.
Referenced by firstRowIndex(), getHeader(), getSkip(), and setSkip().