Data Feed Glossary

Data Feed Glossary or Dictionary

  • Delimiter
    Delimiter inside a data feed file is the separation character between columns. Because each data feed file usually contains multiple columns. To separate the columns there must be a character used for the separation. This is called the delimiter or delimiter character. Common delimiter characters are: TAB, COMMA (csv), PIPE and SPACE.
  • TAB data feed file format
    TAB data feed file format means the data feed file will use TAB (ASCII character 9) or \t to separate the columns.
  • CSV data feed file format
    CSV data feed file format means the data feed file is using CSV (comma separated values) or comma (,) to separate data between columns. CSV is the most popular type of data feed format.

    Separating columns using comma (,) can be problematic sometime, because many column content may also contain comma. To solve this issue CSV format usually encapsulate (wrap content) of the column values inside double-quotes. For example:

    ...,some column value, another value,"column value, with comma",...

    Even though CSV is the most popular, I personally don't like CSV because of the double-quote encapsulation complexity. I prefer TAB delimited format.
  • PIPE data feed file format
    PIPE data feed file format means the data feed file will use | (the vertical line) (the symbol on top of the \ button on your keyboard) to separate the columns.

    PIPE separation is quote nice because it makes the data easier to read when separated by PIPE.
  • SPACE data feed file format
    SPACE data feed format means the data feed file is using SPACE (ASCII character 32) to separate the columns. Since the SPACE character is also commonly used inside column contents,  the only way to ensure column separation is to use fixed column spacing.  Fixed column spacing means the person generating the feed must determine the fixed number of characters for each column. For example, each column inside the feed always have 100 characters. Therefore even if the column content is only 30 characters, 70 characters of spaces must be padded to the right of the 30 characters to make the full 100 characters column size.

    Using SPACE as delimiter is not common and not recommended because of its complexity for the recipient.
  • Line Break
    Line breaks are characters used to separate lines inside a data feed file.  Computers see text file content as just a bunch of characters all in one line. Humans can not read one giant long line of text, therefore lines or rows are needed, that is where line breaks characters comes in.

    Common line break characters are Carriage Return (CR), Line Feed (LF), Carriage Return + Line Feed (CR+LF).

    Only one kind of line break can be used for a particular text file.

    Which line break is being use for a data feed file depends on what operating system was used to generate the data feed file (or depends on the programmer).

    List of Operating System and their default text file line break:
    - PC / Windows uses CR + LF line break
    - Apple / MAC OSX uses CR line break
    - Linux uses LF line break
  • Carriage Return (CR)
    Carriage Return is ASCII character 13 which is used in Apple / MAC operating systems to separate lines (line break). Originally carriage return came from the old typewriter system of pulling the carriage back to its original position ready to type a new line (hence the name carriage return).

    Carriage return combined by line feed (LF) is used by PC / Windows operating systems as their line break.
  • Line Feed (LF)
    Line feed is ASCII character 10 which is used by Linux operating systems to separate lines (line break). LF is also used by PC / Windows as line break when combined with carriage return such as CR + LF. 
  • Data feed specification
    Data feed specification is the rule of how to generate a particular data feed for a particular data feed recipient. A data feed specification should include:
    - data feed format (delimiter, line-break, encoding)
    - header requirements
    - rule for each column 
  • Data feed format
    Data feed format describes about the technical requirements for a particular data feed file, such as what delimiter character is used, what line-break is needed, and what character encoding was used.  Data feed format alone is not enough to completely describe a data feed, more information such as header requirements and rule for each column are also needed, see Data feed specifications .
  • Data feed management
    Data feed management is a professional service of managing your data feed. Managing means helping you generate a data feed file, editing your data feed file, exporting your data feed into different formats and converting your feed to whatever data feed specification your recipient need.
  • Data feed conversion service
    Data feed conversion service is an online service which helps you convert one data feed to multiple data feed formats or specifications. For example: lets say you have an e-commerce store and you would like to convert your data feed to upload to Google Shopping, PriceGrabber, Shopzilla, etc... 

No comments:

Post a Comment