#

Format of Output File

File format for output files:

s_N_export.txt This file contains the results of alignment of all reads
in the lane. This file has a line for every read, not just those that
pass purity filtering. The last field on each line is a flag telling
you whether or not the read passed the filter (Y or N).

s_N_sorted.txt

(deprecated)

This output file is similar to s_N_export.txt, except it contains only entries for reads which pass purity filtering and have a unique alignment in the reference. These are sorted by order of their alignment position, which is meant to facilitate the extraction of ranges of reads for purposes of visualization or SNP calling.

These files are only produced if the flag WITH_SORTED is used.

Pair sequencing output:

s_N_1_sequence.txt,

s_N_2_sequence.txt

These parallel sets of files contain filtered sequences for each

lane.

These files are only produced if the flag WITH_SEQUENCE is used.

s_N_1_export.txt,

s_N_2_export.txt

These parallel sets of files contain the results of alignment of all reads in the lane. The fields are tab separated to facilitate export to databases. Each file has a line for every read, not just those that pass purity filtering. The last field on each line is a flag telling you whether or not the read passed the filter (Y or
N).

s_N_1_sorted.txt,

s_N_2_sorted.txt

(deprecated)

These parallel sets of files are similar to s_N_1_export.txt and s_N_2_export.txt, except they contain only entries for reads which pass purity filtering and have a unique alignment in the reference. These are sorted by order of their alignment position, which is meant to facilitate the extraction of ranges of
reads for purposes of visualization or SNP calling.

These files are only produced if the flag WITH_SORTED is used.

s_N_anomaly.txt This file contains one line for each read for which the two halves of the read did not align with a nominal distance and

orientation from each other. This is the file to mine for structural
variation information.

Not all fields are relevant to a single-read analysis.

1. Machine (Parsed from Run Folder name)

2. Run Number (Parsed from Run Folder name)

3. Lane

4. Tile

5. X Coordinate of cluster. As of RTA v1.6, OLB v1.6, and CASAVA v1.6, the X and Y coordinates for each clusters are calculated in a way that makes sure the combination will be unique. The new coordinates are the old coordinates times 10, +1000, and then rounded.

6. Y Coordinate of cluster. As of RTA v1.6, OLB v1.6, and CASAVA v1.6, the X and Y coordinates for each clusters are calculated in a way that makes sure the combination will be unique. The new coordinates are the old coordinates times 10, +1000, and then rounded.

7. Index sequence or 0. For no indexing, or for a file that has not been demultiplexed yet, this field should have a value of 0.

8. Read number (1 for single reads; 1 or 2 for paired ends or multiplexed single reads; 1, 2, or 3 for multiplexed paired ends)

9. Called sequence of read

10. Quality string--In symbolic ASCII format (ASCII character code = quality value + 64)

11. Match chromosome--Name of chromosome match OR code indicating why no match resulted (RM = repeat masked, for example match against abundant sequences, NM = not matched)

12. Match Contig--Gives the contig name if there is a match and the match chromosome is split into contigs (Blank if no match found)

13. Match Position--Always with respect to forward strand, numbering starts at 1 (Blank if no match found)

14. Match Strand--"F" for forward, "R" for reverse (Blank if no match found)

15. Match Descriptor--Concise description of alignment (Blank if no match found)

  • A numeral denotes a run of matching bases
  • A letter denotes substitution of a nucleotide: For a 35 base read, "35" denotes an exact match and "32C2" denotes substitution of a "C" at the 33rd position
  • The escape sequence "^..$" represents an indel. An integer in the indel escape sequence (e.g. "10^2$18") indicates an insertion relative to reference of the specified size. A sequence in the indel escape sequence (e.g. "10^AG$20") indicates a deletion relative to reference, with the sequence given the deleted reference sequence.

16. Single-Read Alignment Score--Alignment score of a single-read match, or for a paired read, alignment score of a read if it were treated as a single read. Blank if no match found; any scores less than 4 should be considered as aligned to a repeat. -1 for shadow reads.

17. Paired-Read Alignment Score--Alignment score of a paired read and its partner, taken as a pair. Blank if no match found; any scores less than 4 should be considered as aligned to a repeat. Note that in single-ended analyses it is always blank.

18. Partner Chromosome--Name of the chromosome if the read is paired and its partner aligns to another chromosome

19. Partner Contig

  • Not blank if read is paired and its partner aligns to another chromosome and that partner is split into contigs.
  • Blank for single-read analysis

20. Partner Offset

  • If a partner of a paired read aligns to the same chromosome and contig, this number, added to the Match Position, gives the alignment position of the partner.
  • If partner is a shadow read, this value is 0.
  • If partner aligns to a different chromosome and/or contig, the number represents the absolute position of the partner.
  • Blank for single-read analysis unless the record belongs to a part of a spliced RNA read.

21. Partner Strand--To which strand did the partner of the paired read align?

  • "F" for forward, "R" for reverse ("N" if no match found, blank for singleread analysis)

22. Filtering--Did the read pass filtering? 0 - No, 1 - Yes.

CC BY-NC 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.