4.3.4.1. CSV – Babelway

A CSV or comma-separated values file is used for the digital storage of data organized in a table form. Each line in the CSV file corresponds to a row in the table, and within a line, fields are separated by commas (or semicolons, colons or tabs).

As this is a very common and simple file format, CSV files are often used to move tabular data between different computer programs, such as between a database and a spreadsheet program.

Following is a sample CSV file.

NAV Date,ISIN,Fund Name,Share Typ,Fund Ccy,NAV in Fund Ccy,Bid Price in Fund Ccy,Offer Price in Fund Ccy,TNA,Outstanding,Publication Ccy,NAV in Publication Ccy,Bid Price in Publication Ccy,Offer Price in Publication Ccy,Equity Profit,WKN Code,Zwischengewinn,TIS,TID
06/13/2008,LU0223208157,HSBC GIF EMERGING EUROPE EQUITY,AC,EUR,15.088,15.088,15.924,12190498.17,807980.236,GBP,11.96,11.96,12.623,37.650,A0ER9A,0.001,,

More details can be found on Wikipedia.

Specific properties

Delimiter character	The character that is used to separate the fields in the csv. It can be a comma, colon, semicolon or tab.
Use header matching	Only for IN. When checked, matching of the columns will be done based on the label in the header line, instead of the default positional matching. More details in this specific section.
Use RFC4180 parser	Only for IN. If checked, the parsing of the CSV will strictly follow the RFC4180. For example, the RFC4180 specification does not have a concept of an escape character, so double quotes inside a quoted field must be represented by two consecutive double quotes and backslash will be parsed as if it was a normal character. If this option is not checked, the parser will be more lenient and accept some files that are not strictly compliant with the RFC4180.
Always quote	Only for OUT. In csv, quoting of fields (surrounding the value by ") is required when the value contains some special characters (separator, new line or double quote). In all other cases, quoting the value is allowed but not required. This option allows you to decide if you also want quoting in that case.

Meaningful names

Heaving meaningful names in your message definition tree is very important to recognize all the nodes easily. It will help you a lot, especially when using the fields in the mapping.

With CSV MessageDefinitions, you can freely rename the fields in the MessageDefinition tree, and system will automatically adapt and use these names. The names that you have chosen will also be used in the internal XML representation of the CSV.

In the below example, row element has been renamed to Country, and column elements to Name, Population, ... These names will help identifying the nodes in the mapping tree, and are used in the internal XML. The names that you choose do not have to match the name of the labels in the input CSVs.

The same way, you can freely rename any element in your CSV MessageDefinition OUT. These meaningful names will help you understanding your MessageDefinition, and will have no impact on the generated output CSV. The only thing that matters about the internal XML OUT is that it has only elements (not attributes), and is 3-level deep, where the first level is the xml root element, the second level elements will be mapped to rows, and the third level to columns in the rows.

Support for leading and trailing static lines

CSV files typically consist of an optional header line, then a loop of similar lines. But regularly, there will be a few leading or trailing static lines in the file, that contain different data, and can have a different structure.

For an example, a csv file describing an invoice could contain a few leading lines with the info about the sender and the recipient, then a loop of lines containing the info about the invoice lines, and eventually a last line with some totals.

To be able to make the mapping easily, it is very important to have a MessageDefinition tree that have separated fields for these distinct fields in these static lines.

To have that, you can freely create static lines in your MessageDefinition tree, before or after the loop. If you do so, for example adding 2 row nodes before the loop and 1 after the loop, the system will automatically associate the first 2 rows in the file and the last one to these specific nodes, and only map the loop node the remaining lines of the input file.

In short, you can edit your tree as much as you want to make it easier, as long as it remains limited to 3 levels of nodes (root, row, column), and that only one type of row is under a loop. The different row elements may contain different number of columns.

Positional or header matching

Babelway supports 2 ways of making the matching between columns in the input CSV and fields of the MessageDefinition :

Positional matching : N-th column in the file will match N-th field of the row in the MessageDefinition. This is the default mode.
Header matching : fields will match based on the label of the column. To use this mode, you need to check the box "Use header matching".

For header matching, you need to have a label row just before the loop node, containing a static value for every column. The field will match the column of the incoming that have this exact label, whatever its place.

Fields in the input file that do not match any field label will just be ignored.

For example, using header matching, the structure below would accept the 3 displayed files, and produce the exact same output (internal Xml in representation) for all of them. Difference in the order of the fields has no importance (because label is the same as the expected one), and additional column (Main city) will be ignored (because not present in the tree).

Knowing which mode you should use really depends on the assertions that you can make on the input file, even for the future.

Header matching will continue to work without any change even if your provider adds some columns, even in the middle of the file, but will not if your provider changes the labels.
Positional matching will continue to work if the label changes, but not if order of columns is changed.

Useless columns can be ignored

It happens regularly that you need to process CSV input files that contain many columns, but that you don't need all of them.

In that case, it is recommended that you only define in your message definition tree the fields that you need. The system will completely ignore the other columns.

The benefits are :

Your message definition tree will be much simpler, and your life will be much easier when you will have to manipulate it (make changes in the tree, or make mappings using this tree).
The size of the internal messages in Babelway will be much lower. This can be very important if your csv file is big (like 100 MB), because it can lead to the internal representation (in xml) to be even bigger, and maybe hit the message size limit (currently 250 MB). Skipping useless fields can be required in that case to remain under the limit.

To do that, using header matching is often recommended, because it allows you to skip fields that are before the last needed column. With positional matching, you will only be able to skip trailing columns.

Security

For security reasons, Babelway will not accept to generate files that could be used by an attacker to trigger code in the viewer (typically Excel) of the receiver.

Practically, this means that malicious cell contents will be trimmed to be safe (remove leading characters like =, +, - , | or @ ).