## SurvCurv - Database for survival and other incident curves## General## What is SurvCurv?SurvCurv stands for 'Survival Curves'. It is a database of survival data, developed at the EBI. These data can be represented in different ways, with the survival curve being a common representation.## How can I cite SurvCurv in a publication?SurvCurv has been published in:Ziehm, M., Thornton, J.M.,Unlocking the potential of survival data for model organisms through a new database and online analysis platform: SurvCurv, Aging Cell, 2013 DOI:10.1111/acel.12121 PubMed: 23826631 If using datasets please refer to the SurvCurv ID and cite the linked original publication if available. Please see also the section on Data Licensing. ## How can I reference a SurvCurv entry in a publication?Please refer to the SurvCurv ID, cite the linked original publication if available, and please also cite the database. This will help us to gain support for the database. Please see also the section on Data Licensing.## Scripting/Batch requestsDO NOT SUBMIT JOBS THROUGH SCRIPTS OR ROBOTS!If you wish to use this service for a large number of requests, or in an automated way, please contact us through the form. ## Data## What are censored observations?Censored observations are observations that have ended due to reasons other than the normal end points (here natural death). Such reasons could be animals being randomly selected for sample collection, dying of accidents (e.g. caused by flooding, air conditioning failure, etc), escaping, or dying from disease or from atypical causes. These observations, although incomplete, still bear information for survival analysis (animals have survived until the censoring event) and should thus never be omitted.## Data LicensingAccess to the web interface of SurvCurv is made under the EBI's Terms of Use. The public SurvCurv data is made available under a Creative Commons Attribution 3.0 Unported License. This license allows others to use, distribute, tweak, and build upon the database, even commercially, without any other restrictions than properly crediting the original work. Please attribute to: SurvCurv Database - http://www.ebi.ac.uk/thornton-srv/databases/SurvCurv/ and cite the scientific article given below. If using individual data sets please also attribute the linked original publication if available.Publication: Ziehm, M., Thornton, J.M.,Unlocking the potential of survival data for model organisms through a new database and online analysis platform: SurvCurv, Aging Cell, 2013 DOI:10.1111/acel.12121 PubMed: 23826631 The following icons used on this webpage are from the Crystal Clear icon set by Everaldo Coelho as provided by through Wikimedia Commons. The icons are licensed under the GNU Lesser General Public License (LGPL). These icons can be downloaded in a single package at Open Icon Library. The following two icons, also used on this webpage, are derived from Crystal Clear icons and are provided under the LGPL. ## What is SurvTab?SurvTab is a simple tab-separated file format for survival data we defined. Please find a SurvTab-template and equivalent MS Excel template here.Please note that the final file for analysis should be tab-delimited. In MS Excel use "save as Text (Tab delimited)" automatic ending ".txt" in OpenOffice/LibreOffice use "save as Text CSV" with delimiter set to tab, automatic ending ".csv". ## What is JMP® like Tab format?This tab-separated file format for survival data is derived from the commercial statistic software JMP®. It should follow either the template in JMP® like Tab format 0 or template in JMP® like Tab format 1, encoding the indicator death events either by zero or one, respectively.## What are the OASIS Tab formats?These tab-separated file formats (basic and CoxPH) have been defined by the OASIS webservice and are provided here only for compatibility and interoperability reasons.## Can my data be kept private after I submit it?Yes, all submitted information will be kept private until you release them. They will be only visible to you and the database administrators.## Can I download the complete database?Please contact us through the form.## Can I download a dataset I submitted to you?Yes, we are happy to send you the data of datasets you uploaded. Please contact us through the form.## What are these descriptive statistics?
## Mathematical Models## What are the mathematical models?The mathematical models are mathematical descriptions of idealised survival curves of different shape. The models are characterised by a number of parameters that are fitted to the data. The fitted parameter values can be compared between cohorts and can be used to find cohorts with similar characteristics using the “Find similar cohort” function available in the detailed section of each cohort. The meaning of the different parameters is explained under the various models.## What is log(MLE)?MLE stands for Maximum Likelihood Estimate and is the likelihood of the specific model given the observed data. As is common, we present the logarithm of the likelihood. The log(MLE) does not take the number of parameters of the model into account (see AIC and BIC for measures which do). Higher log(MLE) scores indicate a better fit. For more information on Maximum Likelihood Estimation look for example at the Wikipedia page.## What is AIC?AIC stands for Akaike Information Criterion and measures the relative goodness of fit of a statistical model taking into account the number of parameters. In the AIC lower values indicate better fit relative to the number of parameters. The AIC is useful for comparing different models for the same data, but it gives no absolute quality estimate. This also implies that AIC values for models of different data sets are not comparable. The BIC is a similar measure penalizing the number of parameters stronger than the AIC. For more information look for example at the Wikipedia page.## What is BIC?BIC stands for Bayesian Information Criterion (also called Schwarz criterion) and measures the relative goodness of fit of a statistical model taking into account the number of parameters. In the BIC lower values indicate better fit relative to the number of parameters. The BIC is useful for comparing different models for the same data, but it gives no absolute quality estimate. This also implies that BIC values for models of different data sets are not comparable. The AIC is a similar measure penalizing the number of parameters less strongly than the BIC. For more information look for example at the Wikipedia page## Exponential Survival ModelThe exponential survival model assumes that the probability of dying is constant over time. Thus the survival rate is decreasing exponentially over time, giving the model the name. This model is the simplest survival model and it has one free parameter, the time independent hazard ratea. In general the model does not fit survival in protected environments very well.Survival = e ^{-a * t}Mortality = log(a)## Weibull Survival ModelThe Weibull survival model assumes that the probability of dying increases non-exponentially over time. The model has two parameters: the baseline hazard ratea and the rate of increase in mortality with time b.Survival = e^{-(a * t)b}Mortality = log(a^{b} * b * t^{b - 1})Reference Publications: - Pletcher SD, Khazaeli AA, Curtsinger JW. Why do life spans differ? Partitioning mean longevity differences in terms of age-specific mortality parameters. J Gerontol A Biol Sci Med Sci. 2000 Aug;55(8):B381-B389 doi:10.1093/gerona/55.8.B381 PubMed PMID: 10952359
- Pletcher SD. Model fitting and hypothesis testing for age-specific mortality data. J Evol Biol. 1999 May;12(3):430--439 doi:10.1046/j.1420-9101.1999.00058.x
## Gompertz Survival ModelThe Gompertz survival model assumes that the probability of dying increases exponentially over time. The model has two parameters: the baseline hazard ratea and the rate of the exponential increase in mortality with time b. Note that parameter b of the Gompertz model influences the survival exponentially, while in the Weibull model the influence is parametric.Survival = e^{-a / b * (eb * t - 1)}Mortality = log(a * e^{b * t})Reference Publications: - Pletcher SD, Khazaeli AA, Curtsinger JW. Why do life spans differ? Partitioning mean longevity differences in terms of age-specific mortality parameters. J Gerontol A Biol Sci Med Sci. 2000 Aug;55(8):B381-B389 doi:10.1093/gerona/55.8.B381 PubMed PMID: 10952359
- Pletcher SD. Model fitting and hypothesis testing for age-specific mortality data. J Evol Biol. 1999 May;12(3):430--439 doi:10.1046/j.1420-9101.1999.00058.x
## Gompertz-Makeham Survival ModelThe Gompertz-Makeham survival model is an extension of the Gompertz survival model. Makeham's extension adds a time independent termc to the mortality.Survival = e^{-c * t - a / b * (eb * t - 1)}Mortality = log(c + a * e^{b * t})Reference Publications: - Pletcher SD, Khazaeli AA, Curtsinger JW. Why do life spans differ? Partitioning mean longevity differences in terms of age-specific mortality parameters. J Gerontol A Biol Sci Med Sci. 2000 Aug;55(8):B381-B389 doi:10.1093/gerona/55.8.B381 PubMed PMID: 10952359
- Pletcher SD. Model fitting and hypothesis testing for age-specific mortality data. J Evol Biol. 1999 May;12(3):430--439 doi:10.1046/j.1420-9101.1999.00058.x
## Logistic Survival ModelThe Logistic survival model is a different extension of the Gompertz survival model. This extension allows to model a deceleration in mortality at old age. The model has three parameters:a the baseline hazard rate and b the rate of the exponential increase in mortality with time, analogue to the Gompertz survival model. The new parameter s describes the deceleration at advanced age, while at the same time representing the degree of heterogeneity in the population.Survival = (1 + s * a / b * ( e^{b * t} - 1))^{-1/s}Mortality = log(a * e^{b * t}) / ( 1 + s * a / b * (e^{b * t}-1))Reference Publications: ## Logistic-Makeham Survival ModelThe Logistic-Makeham survival model is an extension of the Logistic survival model. Makeham's extension adds a time independent termc to the mortality, in the same way as in the Gompertz-Makeham survival model.Survival = e^{-c * t } * ( 1 + s * a / b * (e^{b * t}-1))^{-1/s}Mortality = log( c + ( a * e^{b * t}) / ( 1 + s * a / b * (e^{b * t} - 1)))Reference Publications: ## Plotting Options## Absolute Representations
## Difference RepresentationsDifference plots are newly defined plots introduced in the SurvCurv publication, specifically representing only the differences between pair of cohorts, such as control and treatment or female and male. There are 4 difference plots based on the 4 absolute representations, each time showing the difference instead of the absolute value. Thus, for difference survival curves, for example, positive values indicate a survival advantage of the treatment compared to the control and negative values a disadvantage. Difference plots cannot only be based on survival curves, but also on mortality curves, showing relative mortality differences. Here, a line below zero would indicate a lower mortality risk in the treatment.All these difference plots also support the use of mathematical models in addition to or instead of survival data. This can be useful for exploring the differences between cohorts via the corresponding models. Alternatively, the differences between two different mathematical models of the same survival data can be visualised in the same way, highlighting the differences between the models. ## What does "connection style" mean?The connection style refers to the way the gaps between the observations are handled. No matter how often observations are performed, they are happening at discrete time intervals, e.g. daily, while the plot has a continuous axis, resulting in "missing" values, or gaps. The “connection style” defines how this discrepancy is handled:
## What output formats are supported?Currently five different file formats for the graphical output are supported (SVG, PDF, PS, TIFF and PNG). Each of these formats has specific advantages and common uses. SVG, PDF, PS use vector graphics and are commonly very welcomed by scientific journals as image formats, while the TIFF and PNG are raster graphics, with PNG being the web-optimized default.
## How does "smoothing" work?Here, smoothing is a sliding window smoothing with selected window size. This means, each value is replaced with the average of the value, the x neighbours to the left and the x to the right. The smoothing is not applied to survival or death plots.## What are historical controls?Historical control data are commonly used in toxicity and cancerogeneity studies in rodents in addition to the parallel control group. They provide perspective of the study in relationship to existing ones as well as quality control for establishing the reasonableness of the current control group (see review reference below). We suggest that pooled historical controls would be a valuable addition to the usual pair of measured control and measured treatment condition in research in ageing as well.We have defined a few potential historical control groups based on the annotated collection of survival measurements, which can be directly used. You can also define your own historical control using various criteria. Reference: Keenan, C., et al. Best practices for use of historical control data of proliferative rodent lesions. Toxicol Pathol. 2009; 37(5):679-693 doi:10.1177/0192623309336154 ## What does "automatically combine replicates" do?This option searches for each selected cohort for annotated replicates in the database and uses observations of all replicates together for the selected analyses. The annotated replicates present in the database are listed in the extended information of each cohort together with other related cohorts.## How to build your own meta-cohort?This function allows you to combine observations into a virtual cohort according to the criteria you specify and can be used for example to create your own historical controls. You start by specifying a name for your virtual cohort and one or many criteria to define which observations to include. Possible criteria include species, strain, gender, study, date, treatment, but also specified IDs of cohorts separated by comma (ranges can be given as "A-B") or even a search term. You can add multiple virtual cohorts one by one to your plot using the "Add" function.Please note virtual cohorts that you create will not be stored and your definition will be lost when ending the session. ## Tests## What is the log-rank test?The log-rank test, also called Mantel-Cox test or Mantel-Haenszel test, assesses if there is a difference between two or more survival curves. For more details on how the test works and relations to other test the Wikipedia article is a good starting point. The log-rank test is more sensitive than the Wilcoxon test to differences between groups later in time.The test report of SurvCurv shows you for each cohort some details of the test-statistics, including the number of observed events and the number of expected events under the hypotheses that all the cohorts are identical. Below these the chi-squared test statistic value, degrees of freedom and their respective p-value are given. Finally, the overall test result based on the p-value is stated. Reference publication: Harrington DP & Fleming TR. A class of rank test procedures for censored survival data. Biometrika. 1982; 69(3):553-566 doi:10.1093/biomet/69.3.553 ## What is the generalized Wilcoxon test?The generalized Wilcoxon test (here Prentices generalization, which is essentially equivalent to Peto & Peto's generalization) tests if there is a difference between two or more survival curves. The test is more sensitive than the log-rank test to differences between groups early in time.The test report shows you for each cohort some details of the test-statistics, including the number of observed events and the number of expected events under the hypotheses that all the cohorts are identical. Below these the chi-squared test statistic value, degrees of freedom and their respective p-value are given. Finally, the overall test result based on the p-value is stated. Reference publication: Harrington DP & Fleming TR. A class of rank test procedures for censored survival data. Biometrika. 1982; 69(3):553-566 doi:10.1093/biomet/69.3.553 ## What is the Wang-Allison Score test?The Wang-Allison Score test is a statistical test for differences between maximal lifespans of two cohorts. To be more robust to sampling problems, i.e. problems caused by the limited number of observations, the longest living 10% are used instead of the longest recorded lifespan. For each cohort the test report shows you the number of observations in the joined 10% longest lifespan, and the fraction of these compared to all observations in the cohort. This is followed by the total number of observations, and the total in the joined 10% longest lifespan, here the fraction should be close 0.10 (as we want the top 10%). Underneath the table the p-value and the age corresponding to the split is given, followed by the overall test statement based on the p-value.Original publication: Wang C, Li Q, Redden DT, Weindruch R, Allison DB. Statistical methods for testing effects on "maximum lifespan". Mech Ageing Dev. 2004 Sep;125(9):629-632. doi:10.1016/j.mad.2004.07.003 PubMed PMID: 15491681 ## What is Fisher's exact test?Fisher's exact test is a general statistical test to examine the significance of the association between two kinds of classifications, commonly represented in a contingency table. For testing the difference in survival between two cohorts A and B at a certain time point we construct the following contingency table:
The p-value is then defined by The test report shows you for each pair of cohorts the p-value as well as the 95% confidence intervals and the odds ratio estimation. The odds ratio is the odds of an event occurring in group A divided by the odds of it occurring in the group B. Thus, if both groups have the same risk the odds ratio is one. Values larger than one indicate a larger risk in group A, while values smaller than one indicate a higher risk in group B. Reference: ## Cox Proportional Hazards Analysis## What is Cox Proportional Hazards?The Cox proportional hazards model is a statistical model of survival data with one or more covariates or factors. It allows to identify which factors significantly contribute to the overall model and quantify their influence. The model operates on hazard rates, also know as mortality rates and assumes that the hazards of the different conditions are proportional, i.e. a multiples of each other. This assumption should be checked (see cox.zph test and diagnostic plots below) and taken into account when interpreting the results.Please note that our Cox online analysis currently does not support stratification or time-varying co-variates. Further information: Cox proportional hazards model on Wikipedia Original publication: Cox, D. Regression models and life-tables J Roy Statist Soc. Ser B (Methodological). 1972; 34, 187-220. JSTOR 2985181 ## What does the result mean?The Cox PH model returns for each covariate or factor a coefficient, and a p-value. The exp(coef) gives the multiplier of the mortality rate, i.e. exp(coef) < 1 indicates a reduced mortality rate corresponding to an increased survival. The p-value indicates the significance of the results.Please always check the proportional hazards assumption (see cox.zph test and diagnostic plots below)! ## What are interaction terms?Interactions terms in a Cox PH model are additional factors for the co-occurrence of the potentially “interacting” covariates or factors. They allow to examine whether two covariates interact and in which way. Covariates might only exhibit an effect when co-occurring, e.g. the UAS-GAL4 expression system in Drosophila, or create a larger effect than the sum of the individual effects when co-occurring, in these cases only the interaction term shows a significant effect (first case) or the interaction term shows a significant effect of the same type like the individual covariates (second case). A third alternative is that one covariate inhibits the effect of another, which would be indicated by a significant interaction term with an effect opposing the individual one. If an interaction term is not significant, no evidence for an interaction between the factors was found.## What is the cox.zph test?The cox.zph test is a test for the proportional hazards assumption of a Cox model. It is a chi-square test between the Kaplan-Meier transformed survival times and the Schoenfeld residuals as proposed by Grambsch and Ternau.The p-values of the proportional hazards tests, like of any other test, are strongly dependent on the sample size. Gross violation may not be statistically significant if the sample size is very small, and even slight violations, causing neglectable errors in the estimated coefficients, may be highly significant if the sample size is very large. An estimate of the size of the deviations from the assumed independence, i.e. no correlation, is given by the respective correlation coefficients rho, which can thus be helpful in interpreting the test results. Original publication: Grambsch, P. and Therneau, T. Proportional hazards tests and diagnostics based on weighted residuals Biometrika, 1994, 81, 515-526 ## What are these diagnostic plots?The diagnostic plots are a graphical way to access the proportional hazard assumptions. They show the scaled Schoenfeld residuals plotted against the transformed time. Additionally shown are a non-linear fitted line of the data (black solid) and a horizontal line (dotted blue) corresponding to the determined Cox coefficient. The should be no clear overall increasing or decreasing tendency and the fitted non-linear line should roughly follow the dotted horizontal line. The example shown below a very good case.Example is the diagnostic plot of the UAS-factor taken from CoxPH analysis of SurvCurv:164, 166, 168, 170 (Ikeya et al., 2009) using factors for GAL4, UAS and GAL4-UAS interaction. Original publication of the used data: Ikeya T, Broughton S, Alic N, Grandison R & Partridge L (2009) The endosymbiont Wolbachia increases insulin/IGF-like signalling in Drosophila. Proc Biol Sci, 276(1674), 3799-3807 ## What to do if CoxPH is violated?If the proportional hazards assumption is violated with noticeable deviations, alternative or more complex analysis might be necessary. Options include Cox proportional hazards analysis with time-dependent co-variates or transformed co-variates, or Accelerated Failure Time (AFT) analysis. These are currently not available in SurvCurv. However, you can download the survival data of interest from the database and perform these analyses locally using your preferred statistics software, such as R. |