EBI Research Thornton SurvCurv

SurvCurv - Database for survival and other incident curves

General
Data
Mathematical Models
Plotting Options
Tests
Cox Proportional Hazards Analysis

General

What is SurvCurv?

SurvCurv stands for 'Survival Curves'. It is a database of survival data, developed at the EBI. These data can be represented in different ways, with the survival curve being a common representation.

How can I cite SurvCurv in a publication?

SurvCurv has been published in:

Ziehm M, Ivanov DK, Bhat A, Partridge L, Thornton JM. SurvCurv database and online survival analysis platform update. Bioinformatics. 2015
DOI:10.1093/bioinformatics/btv463 PubMed: 26249811

Ziehm M, Thornton, JM. Unlocking the potential of survival data for model organisms through a new database and online analysis platform: SurvCurv. Aging Cell. 2013. 12(5). 910-916
DOI:10.1111/acel.12121 PubMed: 23826631

If using datasets please refer to the SurvCurv ID and cite the linked original publication if available. Please see also the section on Data Licensing.

How can I reference a SurvCurv entry in a publication?

Please refer to the SurvCurv ID, cite the linked original publication if available, and please also cite the database. This will help us to gain support for the database. Please see also the section on Data Licensing.

Scripting/Batch requests

DO NOT SUBMIT JOBS THROUGH SCRIPTS OR ROBOTS!
If you wish to use this service for a large number of requests, or in an automated way, please contact us through the form.

Data

What are censored observations?

Censored observations are observations that have ended due to reasons other than the normal end points (here natural death). Such reasons could be animals being randomly selected for sample collection, dying of accidents (e.g. caused by flooding, air conditioning failure, etc), escaping, or dying from disease or from atypical causes. These observations, although incomplete, still bear information for survival analysis (animals have survived until the censoring event) and should thus never be omitted.

Data Licensing

Access to the web interface of SurvCurv is made under the EBI's Terms of Use. The public SurvCurv data is made available under a Creative Commons Attribution 3.0 Unported License. This license allows others to use, distribute, tweak, and build upon the database, even commercially, without any other restrictions than properly crediting the original work. Please attribute to: SurvCurv Database - http://www.ebi.ac.uk/thornton-srv/databases/SurvCurv/ and cite any of the scientific articles given below. If using individual data sets please also attribute the linked original publication if available.

Publication:
Ziehm M, Ivanov DK, Bhat A, Partridge L, Thornton JM. SurvCurv database and online survival analysis platform update. Bioinformatics. 2015
DOI:10.1093/bioinformatics/btv463 PubMed: 26249811

Ziehm M, Thornton, JM. Unlocking the potential of survival data for model organisms through a new database and online analysis platform: SurvCurv. Aging Cell. 2013. 12(5). 910-916
DOI:10.1111/acel.12121 PubMed: 23826631

The following icons used on this webpage are from the Crystal Clear icon set by Everaldo Coelho as provided by through Wikimedia Commons.
The icons are licensed under the GNU Lesser General Public License (LGPL). These icons can be downloaded in a single package at Open Icon Library.
icon search

The following two icons, also used on this webpage, are derived from Crystal Clear icons and are provided under the LGPL.

What is SurvTab?

SurvTab is a simple tab-separated file format for survival data we defined. SurvTab exists in 2 versions: SurvTab1.0 and SurvTab1.2, sharing the same survival data format but different annotation formats above. For user uploads SurvTab/SurvTab1 subversions are automatically determined, while SurvTab2 using a different survival data storage principle needs to be explicitly selected. If you haven't used SurvTab, we recommend the use of SurvTab1.2 or SurvTab2.
Please find a SurvTab1.2-template and equivalent MS Excel template 1.2 here. SurvTab1.0-template and equivalent MS Excel template 1.0 are available as well.
Please note that the final file for analysis should be tab-delimited. In MS Excel use "save as Text (Tab delimited)" automatic ending ".txt" in OpenOffice/LibreOffice use "save as Text CSV" with delimiter set to tab, automatic ending ".csv".

What is SurvTab2?

SurvTab2 is a simple tab-separated file format for survival data we defined. It differs from SurvTab in the way the observation are stored. SurvTab2 stores a list of days/ages, with number of events and event-type. Event-types different from death and censor need to defined in the censoring definition. Please find a SurvTab2-template and equivalent MS Excel template 2 here.
Please note that the final file for analysis should be tab-delimited. In MS Excel use "save as Text (Tab delimited)" automatic ending ".txt" in OpenOffice/LibreOffice use "save as Text CSV" with delimiter set to tab, automatic ending ".csv".

What is JMP® like Tab format?

This tab-separated file format for survival data is derived from the commercial statistic software JMP®. It should follow either the template in JMP® like Tab format 0 or template in JMP® like Tab format 1, encoding the indicator death events either by zero or one, respectively.

What are the OASIS Tab formats?

These tab-separated file formats (basic and CoxPH) have been defined by the OASIS webservice and are provided here only for compatibility and interoperability reasons.

Can my data be kept private after I submit it?

Yes, all submitted information will be kept private until you release them. They will be only visible to you and the database administrators.

How can I download the individual cohorts from the database?

Please select the cohorts you are would like to download as if you wanted to plot them. Then open the advanced options (next to the CoxPH button). Select the desired file format and click download.
Data in SurvCurv is made available under a Creative Commons Attribution 3.0 Unported License. Please attribute to: SurvCurv Database - http://www.ebi.ac.uk/thornton-srv/databases/SurvCurv/, cite any of the SurvCurv scientific articles and the linked original publication.
SurvCurv has been published in:

Ziehm M, Ivanov DK, Bhat A, Partridge L, Thornton JM. SurvCurv database and online survival analysis platform update. Bioinformatics. 2015
DOI:10.1093/bioinformatics/btv463 PubMed: 26249811

Ziehm M, Thornton, JM. Unlocking the potential of survival data for model organisms through a new database and online analysis platform: SurvCurv. Aging Cell. 2013. 12(5). 910-916
DOI:10.1111/acel.12121 PubMed: 23826631

Can I download the complete database?

Please contact us through the form.

Can I download a dataset I submitted to you?

Yes, we are happy to send you the data of datasets you uploaded. Please contact us through the form.

What are these descriptive statistics?

mean	Mean is the sum of the values divided by the number of values	Wikipedia article
minimum	Minimum is the smallest occurring value.	Wikipedia article
lower quartile	Lower quartile is the value that cuts off the lowest 25% of the data.	Wikipedia article
median	Median is the value separating the higher half from the lower half. It is different from the mean in that it is always a value contained in the data set.	Wikipedia article
upper quartile	Upper quartile is the value that cuts off the highest 25% of the data.	Wikipedia article
maximum	Maximum is the largest occurring value.	Wikipedia article
mode	Mode is the most frequently occurring value.	Wikipedia article
standard deviation	Standard deviation is a measure of variability. It is expressed in the same units as the data, unlike variance. The standard deviation is the square root of the variance.	Wikipedia article
variance	Variance is a measure of variability. It is less easily interpretable as the standard deviation (see above).	Wikipedia article
kurtosis	Kurtosis is a measure of peakedness. Higher kurtosis means more of the variance is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations.	Wikipedia article
skewness	Skewness is a measure of asymmetry. A value of zero indicates a relatively even distribution on both sides of the mean.	Wikipedia article

Mathematical Models

What are the mathematical models?

The mathematical models are mathematical descriptions of idealised survival curves of different shape. The models are characterised by a number of parameters that are fitted to the data. The fitted parameter values can be compared between cohorts and can be used to find cohorts with similar characteristics using the “Find similar cohort” function available in the detailed section of each cohort. The meaning of the different parameters is explained under the various models.

What is mathematical mortality model analysis?

When activating this option mathematical mortality models will be calculated for individual cohorts from user files and each meta-cohort. Calculating the optimal model parameter takes some time so please be patient, especially for large number of observations or cohorts or at busier times.
If more than one cohorts is present pair-wise joint mortality models will be fitted. For each parameter a joint model with seperate parameter values for each cohort is compared to a joint model with identical parameter values and significance of difference for the parameter is calculated.

What is log(MLE)?

MLE stands for Maximum Likelihood Estimate and is the likelihood of the specific model given the observed data. As is common, we present the logarithm of the likelihood. The log(MLE) does not take the number of parameters of the model into account (see AIC and BIC for measures which do). Higher log(MLE) scores indicate a better fit. For more information on Maximum Likelihood Estimation look for example at the Wikipedia page.

What is AIC?

AIC stands for Akaike Information Criterion and measures the relative goodness of fit of a statistical model taking into account the number of parameters. In the AIC lower values indicate better fit relative to the number of parameters. The AIC is useful for comparing different models for the same data, but it gives no absolute quality estimate. This also implies that AIC values for models of different data sets are not comparable. The BIC is a similar measure penalizing the number of parameters stronger than the AIC. For more information look for example at the Wikipedia page.

What is BIC?

BIC stands for Bayesian Information Criterion (also called Schwarz criterion) and measures the relative goodness of fit of a statistical model taking into account the number of parameters. In the BIC lower values indicate better fit relative to the number of parameters. The BIC is useful for comparing different models for the same data, but it gives no absolute quality estimate. This also implies that BIC values for models of different data sets are not comparable. The AIC is a similar measure penalizing the number of parameters less strongly than the BIC. For more information look for example at the Wikipedia page

Exponential Survival Model

The exponential survival model assumes that the probability of dying is constant over time. Thus the survival rate is decreasing exponentially over time, giving the model the name. This model is the simplest survival model and it has one free parameter, the time independent hazard rate a. In general the model does not fit survival in protected environments very well.
Survival = e ^{-a * t}
Mortality = log(a)

Weibull Survival Model

The Weibull survival model assumes that the probability of dying increases non-exponentially over time. The model has two parameters: the baseline hazard rate a and the rate of increase in mortality with time b.
Survival = e^{-(a * t)^b}
Mortality = log(a^b * b * t^{b - 1})

Reference Publications:

Pletcher SD, Khazaeli AA, Curtsinger JW. Why do life spans differ? Partitioning mean longevity differences in terms of age-specific mortality parameters. J Gerontol A Biol Sci Med Sci. 2000 Aug;55(8):B381-B389 doi:10.1093/gerona/55.8.B381 PubMed PMID: 10952359
Pletcher SD. Model fitting and hypothesis testing for age-specific mortality data. J Evol Biol. 1999 May;12(3):430--439 doi:10.1046/j.1420-9101.1999.00058.x

Gompertz Survival Model

The Gompertz survival model assumes that the probability of dying increases exponentially over time. The model has two parameters: the baseline hazard rate a and the rate of the exponential increase in mortality with time b. Note that parameter b of the Gompertz model influences the survival exponentially, while in the Weibull model the influence is parametric.
Survival = e^{-a / b * (e^{b * t} - 1)}
Mortality = log(a * e^{b * t})

Reference Publications:

Pletcher SD, Khazaeli AA, Curtsinger JW. Why do life spans differ? Partitioning mean longevity differences in terms of age-specific mortality parameters. J Gerontol A Biol Sci Med Sci. 2000 Aug;55(8):B381-B389 doi:10.1093/gerona/55.8.B381 PubMed PMID: 10952359
Pletcher SD. Model fitting and hypothesis testing for age-specific mortality data. J Evol Biol. 1999 May;12(3):430--439 doi:10.1046/j.1420-9101.1999.00058.x

Gompertz-Makeham Survival Model

The Gompertz-Makeham survival model is an extension of the Gompertz survival model. Makeham's extension adds a time independent term c to the mortality.
Survival = e^{-c * t - a / b * (e^{b * t} - 1)}
Mortality = log(c + a * e^{b * t})

Reference Publications:

Pletcher SD, Khazaeli AA, Curtsinger JW. Why do life spans differ? Partitioning mean longevity differences in terms of age-specific mortality parameters. J Gerontol A Biol Sci Med Sci. 2000 Aug;55(8):B381-B389 doi:10.1093/gerona/55.8.B381 PubMed PMID: 10952359
Pletcher SD. Model fitting and hypothesis testing for age-specific mortality data. J Evol Biol. 1999 May;12(3):430--439 doi:10.1046/j.1420-9101.1999.00058.x

Logistic Survival Model

The Logistic survival model is a different extension of the Gompertz survival model. This extension allows to model a deceleration in mortality at old age. The model has three parameters: a the baseline hazard rate and b the rate of the exponential increase in mortality with time, analogue to the Gompertz survival model. The new parameter s describes the deceleration at advanced age, while at the same time representing the degree of heterogeneity in the population.
Survival = (1 + s * a / b * (e^{b * t} - 1))^-1/s
Mortality = log(a * e^{b * t}) / ( 1 + s * a / b * (e^{b * t}-1))

Reference Publications:

Pletcher SD, Khazaeli AA, Curtsinger JW. Why do life spans differ? Partitioning mean longevity differences in terms of age-specific mortality parameters. J Gerontol A Biol Sci Med Sci. 2000 Aug;55(8):B381-B389 doi:10.1093/gerona/55.8.B381 PubMed PMID: 10952359
Pletcher SD. Model fitting and hypothesis testing for age-specific mortality data. J Evol Biol. 1999 May;12(3):430--439 doi:10.1046/j.1420-9101.1999.00058.x

Logistic-Makeham Survival Model

The Logistic-Makeham survival model is an extension of the Logistic survival model. Makeham's extension adds a time independent term c to the mortality, in the same way as in the Gompertz-Makeham survival model.
Survival = e^{-c * t} * ( 1 + s * a / b * (e^{b * t}-1))^-1/s
Mortality = log( c + ( a * e^{b * t}) / ( 1 + s * a / b * (e^{b * t} - 1)))

Reference Publications:

Pletcher SD, Khazaeli AA, Curtsinger JW. Why do life spans differ? Partitioning mean longevity differences in terms of age-specific mortality parameters. J Gerontol A Biol Sci Med Sci. 2000 Aug;55(8):B381-B389 doi:10.1093/gerona/55.8.B381 PubMed PMID: 10952359
Pletcher SD. Model fitting and hypothesis testing for age-specific mortality data. J Evol Biol. 1999 May;12(3):430--439 doi:10.1046/j.1420-9101.1999.00058.x

Plotting Options

Absolute Representations

Survival Curve	This curve shows the percentage of the population alive over time. It is the most common representation of lifespan data. The survival curve is a cumulative representation.
Death Curve	This curve shows the percentage dead of the total population over time. It is the inverse of the survival curve and like the survival curve, a cumulative representation.
Incidence Curve	This curve shows the distribution of death events or incidences over time. This representation is non-cumulative!
Mortality Curve	This curve shows the negative log-likelihood of dying at each time point, i.e. the mortality, on a log scale. This representation is non-cumulative!

Survival density plots

Survival density plots are newly defined visualisation showing the distribution of a group of survival curves as a two-dimensional density and is introduced in our SurvCurv publication 2015.

Survival density plot	Example of a survival density plot showing the variation of female survival of 290 Drosophila Genome Reference Panel (DGRP) lines, overlaid with the overall survival of female controls of Drosophila strains wDah (red) and w1118 (purple) from other experiments. Survival curve density is represented in sea-colour scheme ranging from white no density, via light blue, dark blue to dark green and black for the highest occurring densities. Please note colour is rescaled to maximum occurring densities for each plot, with the colour legend showing the respective density estimate numbers. These density estimate numbers are equivalent to standard 1D density estimates of R, with the area under each 1D density curve being normalised to 1. Comparing between two different density plots, higher maximum numbers on the scale mean an overall steeper, more peaked density distribution, while lower numbers indicate a flatter, broader distribution. (DGRP lifespan data is unpublished and not available in SurvCurv, special thanks to Trudy F. C. Mackay for permission to show their variation as example here)

There are two ways to create survival density plots: 1) using the plot select box 2) using the advanced option "Use cohort number x as density background". If you like to add a database based density background to curves from your file, please see 3)

Select all cohorts to be included in the plot/upload data file, select "survival density plot" from the drop down menu and click "Plot". Further cohorts can be added to an existing density plot using "Add". Using this way to create survival density plots all cohorts will be used in the density estimate, no cohorts superposing is possible. Data from files can used.
This mode can be only used using data from the database for density estimates. Open the advanced options, select a historic control or use "build you own cohort" to define a cohort comprised of the cohorts which should be used in the density. Add or plot this cohort as desired (using "Survival Curve" from the plot options), add additional cohorts for superposing as desired. Once plot with meta-cohort and all individual cohorts is plotted, type the number of the meta-cohort (counting starts with 1 from the top) and click "Plot". This will use the criteria of the meta-cohort to define the density background and if select plot any individual or further meta-cohorts superposed.
Please find cohorts for density background from database first and follow instructions under 2). Please add the survival data from file after creating the meta-cohort using the advanced options and "Add". Type 1 as cohort number to use as density background and confirm by clicking "Plot".

Note: In black and white colouring option density plots use a grey-scale instead of the default sea-colour scheme.

Difference Representations

Difference plots are newly defined plots introduced in the SurvCurv publication, specifically representing only the differences between pair of cohorts, such as control and treatment or female and male. There are 4 difference plots based on the 4 absolute representations, each time showing the difference instead of the absolute value. Thus, for difference survival curves, for example, positive values indicate a survival advantage of the treatment compared to the control and negative values a disadvantage. Difference plots cannot only be based on survival curves, but also on mortality curves, showing relative mortality differences. Here, a line below zero would indicate a lower mortality risk in the treatment.
All these difference plots also support the use of mathematical models in addition to or instead of survival data. This can be useful for exploring the differences between cohorts via the corresponding models. Alternatively, the differences between two different mathematical models of the same survival data can be visualised in the same way, highlighting the differences between the models.

What does "connection style" mean?

The connection style refers to the way the gaps between the observations are handled. No matter how often observations are performed, they are happening at discrete time intervals, e.g. daily, while the plot has a continuous axis, resulting in "missing" values, or gaps. The “connection style” defines how this discrepancy is handled:

steps	This option assumes that all gaps are due to (omitted) observations of no events, resulting in a stepping behaviour. This is the mathematically speaking "correct" representation for survival curves calculated using the Kaplan-Meier estimator (which is generally used).
lines	This option assumes that the observed events actually happened equally spread between this last observation and the current one. Thus, this options place straight lines between observations.
lines & points	This option shows both the lines option and the points option together.
points	This just shows the individual points defined by observations, instead of any kind of curve.

What output formats are supported?

Currently five different file formats for the graphical output are supported (SVG, PDF, PS, TIFF and PNG). Each of these formats has specific advantages and common uses. SVG, PDF, PS use vector graphics and are commonly very welcomed by scientific journals as image formats, while the TIFF and PNG are raster graphics, with PNG being the web-optimized default.

SVG	Scalable Vector Graphics (SVG) is an XML based vector graphics file format developed by the World Wide Web Consortium (W3C). The graphics are fully scalable and labels are included as extractable text. All major modern web browsers have some degree of support and render SVG images directly (MS Internet Explorer before version 9 does not support SVG natively).	Wikipedia article
PDF	Portable Document Format (PDF) was a proprietary file format of Adobe, now released as open standard. It is a general document format and can include vector graphics, raster graphics and text. The PDF generated here uses vector graphics and embedded text.	Wikipedia article
PS	PostScript (PS) is a programming language often used as page description language using vector based graphics. Many laser printers can directly print postscript files without any preparation by the computer.	Wikipedia article
TIFF	Tagged Image File Format (TIFF) is a generic graphics file format for handling images and data within a single file. It is widely supported by imaging, publishing and page layout applications, however due to its complexity less supported in other applications such as web browsers.	Wikipedia article
PNG	Portable Network Graphics (PNG) is an ISO standard bitmap file format using lossless compression. It is optimised for transferring images on the internet, not for professional-quality print graphics. PNG is set as default output format.	Wikipedia article
(numbers)	Text file with tab delimited table of time values (x-axis) and y-values depending on the plot type selected, e.g. survival value. Multiple cohorts are given in multiple columns, Only actual points are given, smoothing is applied if selected.

How does "smoothing" work?

Here, smoothing is a sliding window smoothing with selected window size. This means, each value is replaced with the average of the value, the x neighbours to the left and the x to the right. The smoothing is not applied to survival or death plots.

What are historical controls?

Historical control data are commonly used in toxicity and cancerogeneity studies in rodents in addition to the parallel control group. They provide perspective of the study in relationship to existing ones as well as quality control for establishing the reasonableness of the current control group (see review reference below). We suggest that pooled historical controls would be a valuable addition to the usual pair of measured control and measured treatment condition in research in ageing as well.
We have defined a few potential historical control groups based on the annotated collection of survival measurements, which can be directly used. You can also define your own historical control using various criteria.

Reference:
Keenan, C., et al. Best practices for use of historical control data of proliferative rodent lesions. Toxicol Pathol. 2009; 37(5):679-693 doi:10.1177/0192623309336154

What does "automatically combine replicates" do?

This option searches for each selected cohort for annotated replicates in the database and uses observations of all replicates together for the selected analyses. The annotated replicates present in the database are listed in the extended information of each cohort together with other related cohorts.

How to build your own meta-cohort?

This function allows you to combine observations into a virtual cohort according to the criteria you specify and can be used for example to create your own historical controls. You start by specifying a name for your virtual cohort and one or many criteria to define which observations to include. Possible criteria include species, strain, gender, study, date, treatment, but also specified IDs of cohorts separated by comma (ranges can be given as "A-B") or even a search term. You can add multiple virtual cohorts one by one to your plot using the "Add" function.
Please note virtual cohorts that you create will not be stored and your definition will be lost when ending the session.

Tests

What is the log-rank test?

The log-rank test, also called Mantel-Cox test or Mantel-Haenszel test, assesses if there is a difference between two or more survival curves. For more details on how the test works and relations to other test the Wikipedia article is a good starting point. The log-rank test is more sensitive than the Wilcoxon test to differences between groups later in time.
The test report of SurvCurv shows you for each cohort some details of the test-statistics, including the number of observed events and the number of expected events under the hypotheses that all the cohorts are identical. Below these the chi-squared test statistic value, degrees of freedom and their respective p-value are given. Finally, the overall test result based on the p-value is stated.

Reference publication:
Harrington DP & Fleming TR. A class of rank test procedures for censored survival data. Biometrika. 1982; 69(3):553-566 doi:10.1093/biomet/69.3.553

What is the generalized Wilcoxon test?

The generalized Wilcoxon test (here Prentices generalization, which is essentially equivalent to Peto & Peto's generalization) tests if there is a difference between two or more survival curves. The test is more sensitive than the log-rank test to differences between groups early in time.
The test report shows you for each cohort some details of the test-statistics, including the number of observed events and the number of expected events under the hypotheses that all the cohorts are identical. Below these the chi-squared test statistic value, degrees of freedom and their respective p-value are given. Finally, the overall test result based on the p-value is stated.

Reference publication:
Harrington DP & Fleming TR. A class of rank test procedures for censored survival data. Biometrika. 1982; 69(3):553-566 doi:10.1093/biomet/69.3.553

What is the Wang-Allison Score test?

The Wang-Allison Score test is a statistical test for differences between maximal lifespans of two cohorts. To be more robust to sampling problems, i.e. problems caused by the limited number of observations, the longest living 10% are used instead of the longest recorded lifespan. For each cohort the test report shows you the number of observations in the joined 10% longest lifespan, and the fraction of these compared to all observations in the cohort. This is followed by the total number of observations, and the total in the joined 10% longest lifespan, here the fraction should be close 0.10 (as we want the top 10%). Underneath the table the p-value and the age corresponding to the split is given, followed by the overall test statement based on the p-value.

Original publication:
Wang C, Li Q, Redden DT, Weindruch R, Allison DB. Statistical methods for testing effects on "maximum lifespan". Mech Ageing Dev. 2004 Sep;125(9):629-632. doi:10.1016/j.mad.2004.07.003 PubMed PMID: 15491681

What is Fisher's exact test?

Fisher's exact test is a general statistical test to examine the significance of the association between two kinds of classifications, commonly represented in a contingency table. For testing the difference in survival between two cohorts A and B at a certain time point we construct the following contingency table:

	cohortA	cohortB
#alive/at risk at time t	a	b
#dead at time t	c	d

The p-value is then defined by [formula for p]

The test report shows you for each pair of cohorts the p-value as well as the 95% confidence intervals and the odds ratio estimation. The odds ratio is the odds of an event occurring in group A divided by the odds of it occurring in the group B. Thus, if both groups have the same risk the odds ratio is one. Values larger than one indicate a larger risk in group A, while values smaller than one indicate a higher risk in group B.

Reference:

Cox Proportional Hazards Analysis

What is Cox Proportional Hazards?

The Cox proportional hazards model is a statistical model of survival data with one or more covariates or factors. It allows to identify which factors significantly contribute to the overall model and quantify their influence. The model operates on hazard rates, also know as mortality rates and assumes that the hazards of the different conditions are proportional, i.e. a multiples of each other. This assumption should be checked (see cox.zph test and diagnostic plots below) and taken into account when interpreting the results.
Please note that our Cox online analysis currently does not support stratification or time-varying co-variates.

Further information:
Cox proportional hazards model on Wikipedia

Original publication:
Cox, D. Regression models and life-tables J Roy Statist Soc. Ser B (Methodological). 1972; 34, 187-220. JSTOR 2985181

What does the result mean?

The Cox PH model returns for each covariate or factor a coefficient, and a p-value. The exp(coef) gives the multiplier of the mortality rate, i.e. exp(coef) < 1 indicates a reduced mortality rate corresponding to an increased survival. The p-value indicates the significance of the results.
Please always check the proportional hazards assumption (see cox.zph test and diagnostic plots below)!

What are interaction terms?

Interactions terms in a Cox PH model are additional factors for the co-occurrence of the potentially “interacting” covariates or factors. They allow to examine whether two covariates interact and in which way. Covariates might only exhibit an effect when co-occurring, e.g. the UAS-GAL4 expression system in Drosophila, or create a larger effect than the sum of the individual effects when co-occurring, in these cases only the interaction term shows a significant effect (first case) or the interaction term shows a significant effect of the same type like the individual covariates (second case). A third alternative is that one covariate inhibits the effect of another, which would be indicated by a significant interaction term with an effect opposing the individual one. If an interaction term is not significant, no evidence for an interaction between the factors was found.

What is the cox.zph test?

The cox.zph test is a test for the proportional hazards assumption of a Cox model. It is a chi-square test between the Kaplan-Meier transformed survival times and the Schoenfeld residuals as proposed by Grambsch and Ternau.
The p-values of the proportional hazards tests, like of any other test, are strongly dependent on the sample size. Gross violation may not be statistically significant if the sample size is very small, and even slight violations, causing neglectable errors in the estimated coefficients, may be highly significant if the sample size is very large. An estimate of the size of the deviations from the assumed independence, i.e. no correlation, is given by the respective correlation coefficients rho, which can thus be helpful in interpreting the test results.

Original publication:
Grambsch, P. and Therneau, T. Proportional hazards tests and diagnostics based on weighted residuals Biometrika, 1994, 81, 515-526

What are these diagnostic plots?

The diagnostic plots are a graphical way to access the proportional hazard assumptions. They show the scaled Schoenfeld residuals plotted against the transformed time. Additionally shown are a non-linear fitted line of the data (black solid) and a horizontal line (dotted blue) corresponding to the determined Cox coefficient. The should be no clear overall increasing or decreasing tendency and the fitted non-linear line should roughly follow the dotted horizontal line. The example shown below a very good case.

Example of Diagnostic Plot for the Proportional Hazards Assumption

Example of Diagnostic Plot for the Proportional Hazards Assumption

Example is the diagnostic plot of the UAS-factor taken from CoxPH analysis of SurvCurv:164, 166, 168, 170 (Ikeya et al., 2009) using factors for GAL4, UAS and GAL4-UAS interaction.

Original publication of the used data:
Ikeya T, Broughton S, Alic N, Grandison R & Partridge L (2009) The endosymbiont Wolbachia increases insulin/IGF-like signalling in Drosophila. Proc Biol Sci, 276(1674), 3799-3807

What to do if CoxPH is violated?

If the proportional hazards assumption is violated with noticeable deviations, alternative or more complex analysis might be necessary. Options include Cox proportional hazards analysis with time-dependent co-variates or transformed co-variates, or Accelerated Failure Time (AFT) analysis. These are currently not available in SurvCurv. However, you can download the survival data of interest from the database and perform these analyses locally using your preferred statistics software, such as R.