|
Isis 3.0 Object Programmers' Reference |
Home |
#include <CSVReader.h>
Collaboration diagram for Isis::CSVReader:

The class will read text strings from an input source stream or file where each line (string) contains a single character delimeter that separates them into tokens. The input stream is text in nature and each line is terminated with a newline as appropriate for the computer system.
This class provides methods that support skipping irrelevant lines and recognizing and utlizing a header line. Tokens within a given line are separated by a single character. Consecutive deliminter characters can be treated as empty tokens (columns) or translated as a single token. Typically, consecutive tokens as empty strings is used for comma separated values (CSV) whereas space delimited strings oftentimes require multiple spaces to be treated as a single separator. This class supports both cases.
Each text line in the input source is read and stored in an internal stack. Only when explicitly requested does parsing take place - no parsing is performed during the reading of the input source. This approach allows the users of this class to alter or otherwise adjust parsing conditions after the input source has been internalized. This makes this implementation efficient and flexible deligating more control to the users of this class.
The mechanism in which parsed data is stored and returned to the callers enviroment makes this class efficient. The returned rows, columns and tables use memory reference counting. This allows parsed data to be exported with virtually no cost to the calling environment in terms of efficiency. It does however, lend itself to utilization issues. Reference counting means that all instances of a parsed row, column or table refer to the same copy of the data and a change in one instance of those elements is reflected in all instances of that same row. Note that this concern rests entirely on how the caller's environment utilizes returned data as only the original lines read from the input source are maintained internal to objects.
The following example demonstrates how to use this class to read a comma delimited file that may have consecutive commas and should be treated as empty columns. Furthermore, there are 2 lines to skip and a header line as well:
cout << "\n\nProcessing comma table...\n"; std::string csvfile("comma.csv"); CSVReader csv(csvfile,true,2,','true);
Another way to ingest this file using methods instead of the constructor is as follows:
cout << "\n\nProcessing comma table using methods...\n"; std::string csvfile("comma.csv"); CSVReader csv; csv.setSkip(2); csv.setHeader(true); csv.setDelimiter(','); csv.setKeepEmptyParts(); csv.read(csvfile);
Definition at line 234 of file CSVReader.h.
Public Types | |
| typedef Parser::TokenList | CSVAxis |
| Row/Column token list. | |
| typedef TNT::Array1D< CSVAxis > | CSVTable |
| Table of all rows/columns. | |
| typedef CollectorMap< int, int > | CSVColumnSummary |
| Column summary for all rows. | |
| typedef TNT::Array1D< double > | CSVDblVector |
| Double array def. | |
| typedef TNT::Array1D< int > | CSVIntVector |
| Integer array def. | |
Public Member Functions | |
| CSVReader () | |
| Default constructor for CSV reader. | |
| CSVReader (const std::string &csvfile, bool header=false, int skip=0, const char &delimiter= ',', bool keepEmptyParts=true) | |
| Parameterized constructor for parsing an input file source. | |
| virtual | ~CSVReader () |
| Destructor (benign). | |
| int | size () const |
| Reports the total number of lines read from the stream. | |
| int | rows () const |
| Reports the number of rows in the table. | |
| int | columns () const |
| Determine the number of columns in the input source. | |
| int | columns (const CSVTable &table) const |
| Determine the number of columns in a parser CSV Table. | |
| void | setSkip (int nskip) |
| Indicate the number of lines at the top of the source to skip to data. | |
| int | getSkip () const |
| Reports the number of lines to skip. | |
| bool | haveHeader () const |
| Returns true if a header is present in the input source. | |
| void | setHeader (const bool gotIt=true) |
| Allows the user to indicate header disposition. | |
| void | setDelimiter (const char &delimiter) |
| Set the delimiter character that separate tokens in the strings. | |
| char | getDelimiter () const |
| Reports the character used to delimit tokens in strings. | |
| void | setKeepEmptyParts () |
| Indicate multiple occurances of delimiters are empty tokens. | |
| void | setSkipEmptyParts () |
| Indicate multiple occurances of delimiters are one token. | |
| bool | keepEmptyParts () const |
| Returns true when preserving succesive tokens, false when they are treated as one token. | |
| void | read (const std::string &fname) throw (iException &) |
| Reads the entire contents of a file for subsequent parsing. | |
| CSVAxis | getHeader () const |
| Retrieve the header from the input source if it exists. | |
| CSVAxis | getRow (int index) const |
| Parse and return the requested row by index. | |
| CSVAxis | getColumn (int index) const |
| Parse and return a column specified by index order. | |
| CSVAxis | getColumn (const std::string &hname) const |
| Parse and return column specified by header name. | |
| CSVTable | getTable () const |
| Parse and return all rows and columns in a table array. | |
| bool | isTableValid (const CSVTable &table) const |
| Indicates if all rows have the same number of columns. | |
| CSVColumnSummary | getColumnSummary (const CSVTable &table) const |
| Computes a row summary of the number of distinct columns in table. | |
| template<typename T> | |
| TNT::Array1D< T > | convert (const CSVAxis &data) const |
| Converts a row or column of data to the specified type. | |
| void | clear () |
| Discards all lines read from an input source. | |
Private Types | |
| typedef CSVParser< iString > | Parser |
| Defines single line parser. | |
| typedef std::vector< std::string > | CSVList |
| Input source line container. | |
Private Member Functions | |
| int | firstRowIndex () const |
| Computes the index of the first data. | |
| std::istream & | load (std::istream &ifile) |
| Reads all lines from the input stream until an EOF is encoutered. | |
Private Attributes | |
| bool | _header |
| Indicates presences of header. | |
| int | _skip |
| Number of lines to skip. | |
| char | _delimiter |
| Separator of values. | |
| bool | _keepParts |
| Keep empty parts between delimiter. | |
| CSVList | _lines |
| List of lines from file. | |
Friends | |
| std::istream & | operator>> (std::istream &is, CSVReader &csv) |
| Input read operator for input stream sources. | |
| typedef CollectorMap<int,int> Isis::CSVReader::CSVColumnSummary |
| typedef TNT::Array1D<double> Isis::CSVReader::CSVDblVector |
| typedef TNT::Array1D<int> Isis::CSVReader::CSVIntVector |
typedef std::vector<std::string> Isis::CSVReader::CSVList [private] |
| typedef TNT::Array1D<CSVAxis> Isis::CSVReader::CSVTable |
typedef CSVParser<iString> Isis::CSVReader::Parser [private] |
| Isis::CSVReader::CSVReader | ( | ) |
Default constructor for CSV reader.
The default constructor sets up to read a source that has not header and skips no lines. It also sets the delimiter to the comma, as implied by its name (CSV = comma separated value), and treats multiple successive occurances of the delimiting character as individual tokens (keeping empty parts).
This method can be used when deferring the reading of the input source. Other methods available in this class can be used to adjust the behavior of the parsing before [i]and[/i] after reading of the source as parsing is performed on demand. This means a single input source can be parsed repeatedly after adjusting parameters.
Definition at line 51 of file CSVReader.cpp.
| Isis::CSVReader::CSVReader | ( | const std::string & | csvfile, | |
| bool | header = false, |
|||
| int | skip = 0, |
|||
| const char & | delimiter = ',', |
|||
| bool | keepEmptyParts = true | |||
| ) |
Parameterized constructor for parsing an input file source.
This constructor can be used when the input source is an identified file. Parameters are available for specifying the parsing behavior, but are not necessarily required here as defaults are provided. Other methods in this class can set parsing conditions after the input file has been read in.
If the file cannot be opened or an error is encountered during the reading of the file, an Isis exception is thrown.
All lines are read in from the file and stored for subsequent parsing. Therefore, parsing can be performed at any time upon returning from this constructor.
| csvfile | Name of file to open and read | |
| header | Indicates if a header exists (true) in the file or not (false) | |
| skip | Number of lines to skip to header, if it exists, or to the first data line | |
| delimiter | Indicates the character to be used to delimit each token in the string/line | |
| keepEmptyParts | Indicates successive delimiters are to be treated as empty tokens (true) or collapsed into one token (false) |
Definition at line 80 of file CSVReader.cpp.
References read().
| virtual Isis::CSVReader::~CSVReader | ( | ) | [inline, virtual] |
| void Isis::CSVReader::clear | ( | ) | [inline] |
Discards all lines read from an input source.
This method discards all lines read from any previous stream. Any subsequent row or column requests will return an empty condition.
Definition at line 409 of file CSVReader.h.
References _lines.
| int Isis::CSVReader::columns | ( | const CSVTable & | table | ) | const |
Determine the number of columns in a parser CSV Table.
This method computes the number of columns from a CSVTable. This table is a result of the getTable method.
It is assumed each row in the table has the same number of columns after parsing. If one or more of the rows contain differing columns, only the smallest number of columns are reported.
| table | The table from which the CVSTable rows are obtained |
Definition at line 130 of file CSVReader.cpp.
References getColumnSummary(), Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::key(), and Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::size().
| int Isis::CSVReader::columns | ( | ) | const |
Determine the number of columns in the input source.
This method is applies the parsing conditions to all data lines to determine the number of columns. Note that it is assumed that all lines contain the same number of columns.
If the number of columns vary in any of the lines, the least number of columns found in all lines is returned due to the nature of how the columns are determined.
Definition at line 110 of file CSVReader.cpp.
References getTable(), and rows().
| TNT::Array1D< T > Isis::CSVReader::convert | ( | const CSVAxis & | data | ) | const |
Converts a row or column of data to the specified type.
This method will convert a row or column of data to the specified type. Since this is a template method, it must be invoked explicity through template syntax. Here is an example to extract a column by a header name and convert it to a double precision array:
// Convert column 0/1 to double CSVReader::CSVAxis scol = csv.getColumn("0/1"); CSVReader::CSVDblVector dcol = csv.convert<double>(scol);
At present, this class uses the Isis iString class as its token storage type (TokenType). All that is required is that it have a cast operator for a given type. If the Isis iString class has the operator, it can be invoked for that type. The precise statement used to convert the token to the explict type is:
out[i] = (T) s;
Note that conversions of specific special pixel values is not inherently handled by this method. If you anticipate textual representations of special pixels, such as NULL, LIS etc..., this is left up to the caller to handle directly.
| data | Input row or column |
Definition at line 468 of file CSVReader.h.
| int Isis::CSVReader::firstRowIndex | ( | ) | const [inline, private] |
Computes the index of the first data.
This convenience method computes the index of the first data row considering the number of lines to skip and the existance of a header line.
Definition at line 428 of file CSVReader.h.
References _header, and _skip.
Referenced by getColumn(), getRow(), getTable(), and rows().
| CSVReader::CSVAxis Isis::CSVReader::getColumn | ( | const std::string & | hname | ) | const |
Parse and return column specified by header name.
This method will parse and extract a column that corresponds to named column in the header. This method return a zero-length array if a header does not exist for this input source or the named column does not exist.
The header is parsed using the same rules as each row. It is the responsibility of the user of this class to specify the existance of a header. Once the header is parsed, a case-insensitive search of the names is performed until the requested column name is found. The index of this header name is then used to extract the column from each row.
It is assumed the column exists in each row. If it does not, a default constructed token is returned for non-existant columns in a row.
| hname | Name of the column as it exists in the header |
Definition at line 270 of file CSVReader.cpp.
References Isis::iString::Equal(), getColumn(), and getHeader().
| CSVReader::CSVAxis Isis::CSVReader::getColumn | ( | int | index | ) | const |
Parse and return a column specified by index order.
This method extracts a column from each row and returns the result. Note that parsing rules are applied to each row and the column at index is extracted and returned in the array. The array is always the number of rows from the input source (less skipped lines and header if they exist).
It is assumed that every row has the same number of columns (
| index | Zero-based column index to parse and return |
Definition at line 227 of file CSVReader.cpp.
References _delimiter, _keepParts, _lines, firstRowIndex(), Isis::CSVParser< TokenStore >::parse(), rows(), and Isis::CSVParser< TokenStore >::size().
Referenced by getColumn().
| CSVReader::CSVColumnSummary Isis::CSVReader::getColumnSummary | ( | const CSVTable & | table | ) | const |
Computes a row summary of the number of distinct columns in table.
A CSVColumnSummary is a CollectorMap where the key is the number of columns and the value is the number of rows that contain that number of columns. This is useful to determine the consistancy of a parser input source such that every row contains the same number of columns.
Once this summary is computed, there should exist one and only ome element in the summary where the key is the column count for each row and the value of that key is the number of rows that contain those columns.
This example shows how to determine this information:
CSVReader::CSVTable table = csv.getTable(); CSVReader::CSVColumnSummary summary = csv.getColumnSummary(table); cout << "Number of columns: " << csv.columns(table) << endl; cout << "Number distinct columns: " << summary.size() << endl; for (int ncols = 0 ; ncols < summary.size() ; ncols++) { cout << "--> " << summary.getNth(ncols) << " rows have " << summary.key(ncols) << " columns." << endl; }
| table | Input table as returned by the getTable method |
Definition at line 351 of file CSVReader.cpp.
References Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::add(), Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::exists(), and Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::get().
Referenced by columns(), and isTableValid().
| char Isis::CSVReader::getDelimiter | ( | ) | const [inline] |
Reports the character used to delimit tokens in strings.
Definition at line 359 of file CSVReader.h.
References _delimiter.
| CSVReader::CSVAxis Isis::CSVReader::getHeader | ( | ) | const |
Retrieve the header from the input source if it exists.
This method will return the header if it exists after appling the parsing rules.
The existance of the header is determined entirely by the user of this class. If the header does not exist, a zero-length array is returned.
Note that this routine does not trim leading or trailing whitespace from each header. This must be handled by the caller.
Definition at line 181 of file CSVReader.cpp.
References _delimiter, _header, _keepParts, _lines, _skip, and rows().
Referenced by getColumn().
| CSVReader::CSVAxis Isis::CSVReader::getRow | ( | int | index | ) | const |
Parse and return the requested row by index.
This method will parse and return the requested row from the input source as an array. If the requested row is determined to be an invalid index, then a zero-length array is returned. It is up to the caller to check for validity of the returned row array.
| index | Index of the desired row to return |
Definition at line 199 of file CSVReader.cpp.
References _delimiter, _keepParts, _lines, firstRowIndex(), and rows().
| int Isis::CSVReader::getSkip | ( | ) | const [inline] |
Reports the number of lines to skip.
This is the number of lines to skip to get to the header, if one exists, or to the first row of data to parse.
Definition at line 306 of file CSVReader.h.
References _skip.
| CSVReader::CSVTable Isis::CSVReader::getTable | ( | ) | const |
Parse and return all rows and columns in a table array.
This method returns a 2-D table of all rows and columns after parsing rules are applied. Each column or token in each row is returned as a CSVParser::TokenType. Subsequent conversion can be performed if the type sufficiently supports it or the user can provide its own conversion techniques.
The validity of the table with regards to column integrity (same number of columns in each row) can be checked with the isTableValid method. A summary of the number of rows containing differing numbers of columns is provided by the getColumnSummary method.
The returned table does not include the header row or any skipped rows. An empty table, zero-length array is returned if no rows are present.
The table itself is a 1-dimenional array that contains a row at each element. This conceptually is a 2-dimensional table. Each element in the row (first) dimension of the table is a CSVAxis array containing parsed columns or tokens. Note that the number of columns may vary from row to row.
Definition at line 309 of file CSVReader.cpp.
References _delimiter, _keepParts, _lines, firstRowIndex(), Isis::CSVParser< TokenStore >::parse(), Isis::CSVParser< TokenStore >::result(), rows(), and table.
Referenced by columns().
| bool Isis::CSVReader::haveHeader | ( | ) | const [inline] |
Returns true if a header is present in the input source.
The existance of a header line is always determined by the user of this class. See the setHeader() method for additional information on header maintainence.
Definition at line 315 of file CSVReader.h.
References _header.
| bool Isis::CSVReader::isTableValid | ( | const CSVTable & | table | ) | const |
Indicates if all rows have the same number of columns.
This method checks the integrity of all rows in the inputs source as to whether they have the same number of columns.
| table | Input table to check for integrity/validty |
Definition at line 379 of file CSVReader.cpp.
References getColumnSummary(), and Isis::CollectorMap< K, T, ComparePolicy, RemovalPolicy, CopyPolicy >::size().
| bool Isis::CSVReader::keepEmptyParts | ( | ) | const [inline] |
Returns true when preserving succesive tokens, false when they are treated as one token.
Definition at line 388 of file CSVReader.h.
References _keepParts.
| std::istream & Isis::CSVReader::load | ( | std::istream & | ifile | ) | [private] |
Reads all lines from the input stream until an EOF is encoutered.
This method is the used to read from an input stream all lines of text until an end-of-file (EOF) is encountered. It is used to perform read operations for all sources of input, files and direct streams as supplied by the users of this class.
All lines are assumed to end with a newline sequence pertinent to the systems this software is compiled on. All lines are stored as they are read in parsing operations.
As lines are read in from the input stream, they are pushed onto the internal stack in the order they are read. The calling environment is responsible for the state of the stack as to whether it is cleared or appended to an existing state.
| ifile | Input source stream of lines of text |
Definition at line 406 of file CSVReader.cpp.
References _FILEINFO_, _lines, iline, and Isis::iException::Message().
Referenced by Isis::operator>>().
| void Isis::CSVReader::read | ( | const std::string & | csvfile | ) | throw (iException &) |
Reads the entire contents of a file for subsequent parsing.
This method opens the specified file and reads every line storing them in this object. It is assumed this file is a text file. Other methods in this class can be utilized to set parsing conditions before [i]or[/i] after the file has been read.
Note that parsing the file is deferred until explicity invoked through other methods in this class. Users of this class can extract individual rows, columns or the complete table.
This object is reentrant. Additional files can be read in. Any existing data from previous input sources is discarded upon subsequent reads.
| csvfile | Name of file to read |
Definition at line 153 of file CSVReader.cpp.
References _FILEINFO_, in, and Isis::iException::Message().
Referenced by CSVReader().
| int Isis::CSVReader::rows | ( | ) | const [inline] |
Reports the number of rows in the table.
This method returns only the number of rows of data. This count does not include skipped lines or the header line if either exists. Note that if no lines are skipped and no header exists, this count will be identical to size().
Definition at line 273 of file CSVReader.h.
References _lines, and firstRowIndex().
Referenced by columns(), getColumn(), getHeader(), getRow(), and getTable().
| void Isis::CSVReader::setDelimiter | ( | const char & | delimiter | ) | [inline] |
Set the delimiter character that separate tokens in the strings.
This method provides the user of this class to indicate the character that separates individual tokens in each row, including the header line.
One must ensure the delimiter character is not within tokens (such as comma delimited strings) or incorrect parsing will occur.
| delimiter | Single character that delimits tokens in each string |
Definition at line 352 of file CSVReader.h.
References _delimiter.
| void Isis::CSVReader::setHeader | ( | const bool | gotIt = true |
) | [inline] |
Allows the user to indicate header disposition.
The determination of a header is entirely up to the user of this class. If a header exists, the user must indicate this with a true parameter to this method. That line is excluded from the row-by-row and column data parsing operations. If no header exists, provide false to this method.
It is assumed that headers exist immediately prior to data rows and any skipped lines preceed the header line. Only one line is presumed to be a header.
Note that this method can be set at any time in the process of reading from a file or stream source as parsing is done on demand and not at the time the source is read in.
| gotIt | True indicates the presence of a header, false indicates one does not exist. |
Definition at line 337 of file CSVReader.h.
References _header.
| void Isis::CSVReader::setKeepEmptyParts | ( | ) | [inline] |
Indicate multiple occurances of delimiters are empty tokens.
Use of this method indicates that when multiple instances of the delimiting character occure in succession, they should be treated as empty tokens. This is useful when input sources truly have empty fields.
Definition at line 369 of file CSVReader.h.
References _keepParts.
| void Isis::CSVReader::setSkip | ( | int | nskip | ) | [inline] |
Indicate the number of lines at the top of the source to skip to data.
This method allows the user to indicate the number of lines that are to be ignored at the begining of the input source. These lines may contain any text, but are presistantly ignored for all row and column parsing operations.
Note that this should not include a header line if one exists as the header methods maintain that information for parsing operations. It is assumed that header lines always follow skipped lines and immediately precede data lines.
| nskip | Number of lines to skip |
Definition at line 296 of file CSVReader.h.
References _skip.
| void Isis::CSVReader::setSkipEmptyParts | ( | ) | [inline] |
Indicate multiple occurances of delimiters are one token.
Use of this method indicates that when multiple instances of the delimiting character occurs in succession, they should be treated as a single token. This is useful when input sources have space separated tokens. Frequently, there are many spaces between values when spaces are used as the delimiting character. Call this method when spaces are used as token delimiters.
Definition at line 381 of file CSVReader.h.
References _keepParts.
| int Isis::CSVReader::size | ( | ) | const [inline] |
Reports the total number of lines read from the stream.
Definition at line 261 of file CSVReader.h.
References _lines.
| std::istream& operator>> | ( | std::istream & | is, | |
| CSVReader & | csv | |||
| ) | [friend] |
Input read operator for input stream sources.
This input operator can be invoked directly from the users environment to read the complete input source. It can also be used to augment an existing source as this technique does not discard existing data (lines).
It is presumed that any additional input sources are consistant to pre-established parsing guidelines otherwise, the integrity of the table is compromized.
Here is an example of how to use this method:
ifstream ifile("myfile.csv"); CSVReader csv; ifile >> csv;
| is | Input stream source | |
| csv | CSVReader object to read input source lines from |
Definition at line 447 of file CSVReader.cpp.
char Isis::CSVReader::_delimiter [private] |
Separator of values.
Definition at line 415 of file CSVReader.h.
Referenced by getColumn(), getDelimiter(), getHeader(), getRow(), getTable(), and setDelimiter().
bool Isis::CSVReader::_header [private] |
Indicates presences of header.
Definition at line 413 of file CSVReader.h.
Referenced by firstRowIndex(), getHeader(), haveHeader(), and setHeader().
bool Isis::CSVReader::_keepParts [private] |
Keep empty parts between delimiter.
Definition at line 416 of file CSVReader.h.
Referenced by getColumn(), getHeader(), getRow(), getTable(), keepEmptyParts(), setKeepEmptyParts(), and setSkipEmptyParts().
CSVList Isis::CSVReader::_lines [private] |
List of lines from file.
Definition at line 417 of file CSVReader.h.
Referenced by clear(), getColumn(), getHeader(), getRow(), getTable(), load(), rows(), and size().
int Isis::CSVReader::_skip [private] |
Number of lines to skip.
Definition at line 414 of file CSVReader.h.
Referenced by firstRowIndex(), getHeader(), getSkip(), and setSkip().