#

Summary.htm Details

Summary.htm

The Summary.htm file is the file you should review after your
analysis is complete. The key parameters that you should examine are
listed in the following:

Clusters

This column contains the average number of clusters per tile detected in the first cycle images.

Fewer clusters than expected:
  • Problem with cluster formation
  • Blurred images Poor focus or dirty flow cell surface
  • Lots of clusters visible Cluster density or size is too great to
    distinguish individual objects
More clusters than expected:
  • Too many clusters on the flow cell: Problem with cluster formation
  • Very large clusters: Double counting

Average First Cycle Intensity

Generally, brighter is better, but this result is instrument and sample dependent.

Percentage of First Cycle Intensity Remaining After 20 Cycles of Sequencing

Generally, the higher, the better. The intensity remaining can be sample dependent.

Percentage of Clusters Passing Filters

To remove the least reliable data from the analysis, the raw data can be filtered to remove any clusters that have “too much” intensity corresponding to bases other than the called base. By default, the purity of the signal from each cluster is examined over the first 25 cycles and chastity =  Highest_Intensity / (Highest_Intensity + Next_Highest_Intensity) is calculated for each cycle. The new default filtering implemented in at the base calling stage is that at most one cycle is less than the chastity threshold.

The higher the value, the better. This value is very dependent on cluster density, since the major cause of an impure signal in the early cycles is the presence of another cluster within a few micrometers.

Very few clusters passing filter

Possible Cause:
  • Poor flow cell, perhaps unblocked DNA
  • Faint clusters
  • Out of focus
  • Poor matrix
  • A fluidics or sequencing failure
  • Bubbles in individual tiles
  • Too many clusters
  • Large clusters
  • High phasing or prephasing
Suggested Action:
  • Some of the causes may be at a single cycle. If the problem is isolated to these early cycles, it is possible that this filtering throws away very good data.
  • Base calling errors may be limited to affected cycles, and, as early cycles are fairly resistant to minor focus and fluidics problems, even the number of errors may be few. The filtering can always be set manually to some other values. Check before assuming all the data are poor.

Percentage Error Rate of Clusters Passing Filters

This value should be as low as possible, but it is very dependent on read length. If there is a sudden rise beyond cycle 32, then it is likely that ELANDv2 has effectively filtered out many clusters with more than two errors, thus suppressing the true error rate up to this point. The percentage aligning will also be low.

Percentage of Phasing and Prephasing

Ideally, these values should be as low as possible.

Standard Deviations

Many values have standard deviations associated with them. This can be the first indication as to the uniformity of the flow cell. If standard deviations are high, then it indicates variability from tile to tile with a lane.

Percentage of Clusters Passing Filters that Align Uniquely to the Reference Genome

Optimal value depends on the genome sequenced and the read-length; the higher (up to 100% max), the better. This result is genome specific and dependent on the completeness of the reference. A failure to align could be due to repeat or missing regions, or due to indels where sample and reference do not match.

Lane Results Summary

This table displays basic data quality metrics for each lane. Apart from Lane Yield, which is the total value for the lane, all the statistics are given as means and standard deviations over the tiles used in the lane.

  • Clusters (raw)—The number of clusters detected by the image
    analysis module.
  • Clusters (PF)—The number of detected clusters that meet the
    filtering criterion listed in Lane Parameter Summary.
  • 1st Cycle Int (PF)—The average of the four intensities (one per
    channel or base type) measured at the first cycle averaged over
    filtered clusters.
  • % Intensity after 20 cycles (PF)—The corresponding intensity
    statistic at cycle 20 as a percentage of that at the first cycle.
  • % PF Clusters—The percentage of clusters passing filtering.
  • % Align (PF)—The percentage of filtered reads that were uniquely
    aligned to the reference. For eland_rna it is number of PF reads
    aligned to the genome and splice junctions. Reads aligned to abundant
    sequences and masked by eland_rna do not participate in this number.
  • Alignment Score (PF)—The average filtered read alignment score
    (reads with multiple or no alignments effectively contribute scores of
    0). For phiX spikes, the number of reads aligning to PhiX is small and
    therefore the reported alignment score (small number of aligned reads
    divided by total number of PF reads) is usually small.
  • % Error Rate (PF)—The percentage of called bases in aligned reads
    that do not match the reference.
  • If eland_pair analysis has been specified for one or more lanes,
    then two Lane Results Summaries are produced, one for each read. All
    lanes for which analysis has been specified are represented in the Read
    1 table, but only those for which eland_pair analysis has been
    specified contribute statistics to the Read 2 table.

Expanded Lane Summary

This displays more detailed quality metrics for each lane. Apart from the phasing and prephasing information, all values are tile means for the lane.

  • Clusters (tile mean) (raw)—The number of clusters detected by the
    image analysis module.
  • % Phasing—The estimated (or specified) value used for the
    percentage of molecules in a cluster for which sequencing falls behind
    the current position (cycle) within a read.
  • % Prephasing—The estimated (specification is not recommended)
    value used for the percentage of molecules in a cluster for which
    sequencing jumps ahead of the current position (cycle) within a read.
  • % Error Rate (raw)—The percentage of called bases in aligned
    reads from all detected clusters that do not match the reference.
  • Equiv Perfect Clusters (raw)—The number of clusters in the ideal
    situation of read base perfectly predicting reference base that would
    provide the same information content (entropy of reference base given
    read base and a prior assumption of equiprobable reference bases) as
    calculated for all actual detected clusters.
  • % Retained—The percentage of clusters that passed filtering.
  • Cycle 2-4 Av Int (PF)—The intensity averaged over cycles 2, 3,
    and 4 for clusters that passed filtering.
  • Cycle 2-10 Av % Loss (PF)—The average percentage intensity drop
    per cycle over cycles 2–10 (derived from a best fit straight line for
    log intensity versus cycle number).
  • Cycle 10-20 Av % Loss (PF)—The average percentage intensity drop
    per cycle over cycles 10–20 (derived from a best fit straight line for
    log intensity versus cycle number).
  • % Align (PF)—The percentage of filtered reads that were uniquely
    aligned to the reference.
  • % Error Rate (PF)—The percentage of called bases in aligned
    filtered reads that do not match the reference.
  • Equiv Perfect Clusters (PF)—The number of clusters in the ideal
    situation of read base perfectly predicting reference base that would
    provide the same information content (entropy of reference base given
    read base and a prior assumption of equiprobable reference bases) as
    calculated for the actual clusters that passed filtering.

If eland_pair analysis has been specified for one or more lanes, then two Expanded Lane Results Summaries are produced, one for each read. All lanes for which analysis has been specified are represented in the Read 1 table, but only those for which eland_pair analysis has been specified contribute statistics to the Read 2 table.

Pair Summary

For lanes for which eland_pair analysis was performed, there are two per-tile summary tables (one for each read). These tables are preceded by a set of tables collectively entitled the Pair Summary. The Pair Summary tables
provide statistics about the alignment outcomes of the two reads individually and as a pair, the latter including relative orientation and separation (insert size) of partner read alignments.

The following tables are displayed in Pair Summary:

  • Relative Orientation Statistics
  • Insert Size Statistics
  • Insert Statistics (% of individually uniquely alignable pairs)
  • Relative Orientation Statistics—The relative orientation of a
    pair is the orientation of read 2 relative to the orientation of read
    1, based on the definition that the read 1 orientation is forward. The
    relative orientation is defined as positive if the read 2 position is
    greater than the read 1 position.These statistics are given only for
    those pairs in which both reads were individually uniquely aligned,
    since these are the reads used to determinethe predominant relative
    orientation. Other orientations are considered anomalous and are
    filtered out.
  • The symbols used in the column headings are intended as a visual
    reminder of the definitions of the four possible relative orientations.
    In the example below, the nominal orientation is correctly computed as
    the two reads “pointing to” each other, as expected for the standard
    Illumina short insert paired-read sample prep.Unlike these short insert
    pairs that have a predominance in opposite and inwardly facing read
    pairs (R+: > R1 R2 <), the large insert mate pair libraries
    expect to produce a predominance in opposite and outwardly facing read
    pairs (R-: < R2 R1 >). High frequencies of paired reads having
    the same orientation (F-: > R2 R1 > or F+: > R1 R2 >) may
    be indicative of a sample preparation problem, or evidence of an
    adapter read through problem found when the read lengths are long
    relative to the library insert size. Insert Size Statistics—Statistics
    are derived from the insert sizes of those pairs in which both reads
    were individually uniquely aligned and have the predominant relative
    orientation. First, the median is determined. Then, a standard
    deviation value is determined independently for those values below the
    median and those above it. The lower and upper thresholds for
    acceptable insert sizes are then defined as three of the relevant
    standard deviations below and above the median, respectively. Insert
    Statistics (% of individually uniquely alignable pairs)—This table
    shows the number of inserts (out of those used to calculate insert size
    statistics) considered acceptable in size and of those falling outside
    the thresholds displayed in the Insert Size Statistics table. The
    percentages are relative to the original number of pairs in which both
    reads were individually uniquely aligned.

 

CC BY-NC 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.