Searching

You can use different syntaxes in the SISu search engine. You can search by position or a range of positions. Keep in mind that all positions used in SISu are in GRCh37. You can also search by gene or rs number. SISu searches are not case-sensitive, so brca1, BRCA1 and brCa1 work all the same. Please refer to the table below for examples on the different ways of searching.

Search forHow toDescription
GeneBRCA1Search for a specific gene using x notation
Rs numberrs23445342Search for a variant using rs number (if such exists)
Position17:3453222Search for a variant using a chromosome:position syntax (GRCh37)
Range of position17:3453222-2353334Search for all variants that are located between two positions (GRCh37)

Understanding the data

The columns displayed in the SISu search results are explained in detail below:

Chr:position
Chromosome (1-22, X, Y, MT) and GRCh37/Hg19 position.

Rs number
dbSNP build 146 rsid for variant (if available).

Ref/Alt
Reference and alternative allele(s) in relation to reference genome.

Allele frequency
Alternative allele frequency after quality control.

Allele frequency (pre-qc)
Alternative allele frequency before quality control.

N (ref hom / het / alt hom)
Number of individuals respective by genotypes (reference allele homozygous, heterozygous and alternative allele homozygous). This is shown only for bi-allelic sites, after quality control.

Hardy-Weinberg P
Hardy Weinberg P-value (only for bi-allelic sites).

Annotation
Variant Effect Predictor annotation, most severe consequence.

Repetitive region
Indicator that variant is located in tandem repeat or segmental duplication region.

Filter PASS
VQSR filter flag (PASS), i.e. a confident variant call is made at this position.

Enriched ln (OR)
Enrichment in Finnish population. Natural logarithm of AF (Finnish) / maxAF(others in ExACs). This is provided only if the value is over 1.5.

Capturekit coverage (shown in detailed view only)
Lists exome capture kits that cover variant in their bait regions (A11 = agilent11refseqplus3boosters, A50 = agilentsureselect50mb, H21 = seqcap_hgsc_vcrome, IC1 = illuminacodingv1).

About the visualization

The variants which are in the Finrisk dataset also have a visualization in SISu. This visualization is different for common variants (>=75 samples) and rare variants (< 75 samples). It is based on the county of residence of the sampled persons and shows the minor allele frequency for each county for both common and rare variants. For rare variants it also shows the locations of individual samples on the map. The minor allele frequencies for each county have been calculated using this formula (where the Ns are the amounts of samples of that respective county):

(Nhet + 2*Nalt hom) / ((Nref hom + Nhet + Nalt hom)*2)

For both visualizations, hovering your mouse over a county will show a tooltip containing detailed information of the allele frequency and N:s (ref hom, het, alt hom) of that county.  Take in account that in Finland, the population density is not the same throughout the country. Refer to the population density map below (source: Wikipedia 2016) to find out how the population is spread around the country.

Common variant visualization

For common variants (>=75), the visualization shows a colored map of Finland. In the colored map, a white area with dashed borders means that no Finrisk samples have exist for that county. For counties from which samples have been gathered, a color is shown between white and blue (#0095A9), with grey borders. The color reflects the minor allele frequency of that county: the bluer, the larger the allele frequency is. The value is linearly normalized between the counties with the biggest allele frequency (#0095A9 blue) and zero (white). Because of this, visualizations of different variants can not be directly compared with each other.

Rare variant visualization

For rare variants (<75), the visualization shows a greyscale map of Finland with black dots scattered around the map. Also in this visualization,  a white county with dashed borders means that no samples have been gathered from that county. For counties from which samples have been gathered, a color is shown between white and grey (#DBDBDB). The color reflects the minor allele frequency of that county: the darker the larger the allele frequency is. As in the common variant visualization, in the rare variant visualization the value is linearly normalized between the biggest allele frequency and zero. Thus, different variants' visualizations cannot be directly compared with each other.

In the rare variant visualization the black dots show the birthplaces of the individual sample persons. In the database, these datapoints contain location info on a municipal level. Because of this, the visualization slightly randomizes the sample locations to enable the showing of all individual samples on the map. Do not take the locations as exact coordinates, but as guidelines on the general location of the sample. It is also possible that one individual appears in the dataset several times and thus is also shown on the map multiple times. Hence, always compare the values shown on map to the SISu reference data and be especially careful if the numbers of homo- or heterozygous individuals are bigger on the map than in the aggregated SISu reference data.