1Shanghai Ocean University, Shanghai, China.
2Central Marine Research and Design Institute, Saint Petersburg, Russia.
*Corresponding Author: Kelin Wang
Shanghai Ocean University, Shanghai, China.
Email: m210811423@st.shou.edu.cn
This study validates novel biological and health systems reliability method, in particular suitable for complex multi-dimensional public health and biological systems, versus the well-established bivariate bio-reliability method, being known to accurately assess bivariate dynamic system’s extreme contours. Contemporary reliability methods, that deal with spatio-temporal observations, do not always have advantage of coping easily with high-dimensional biosystems, along with complex cross-correlations various different system components. The suggested methodology coped with this challenge well. This study aimed to assess joint cardiovascular along with cancer diseases death rate risks, at any given time horizon, and in any region/country of interest, by means of applying novel Gaidai-Yakimov reliability method to raw clinical dataset. Multicentre, population-based, spatio-temporal, medical survey dataset analysis, based on novel bio-statistical method has been adopted. In this study, annual numbers of deaths, associated with cancer and cardiovascular diseases, recorded in all 195 world’s countries, have been selected. It has always been challenging to model such phenomena, due to bio-system non-stationarity, high-dimensionality, and cross-couplings between different biological or public health system’s components. Main purpose of this study was to benchmark Gaidai-Yakimov state-of-the-art technique, enabling optimal and efficient extraction of relevant statistical information from an underlying raw clinical dataset. Suggested methodology may be used for prognostics in various public digital health applications, based on their available clinical raw survey datasets.
CVD (Cardiovascular disease) is a term being used to refer to a range of human diseases, affecting blood vessels and heart. Those include abnormal blood pressures (hypertensions); Heart attacks (coronary heart diseases); Strokes (cerebrovascular diseases); Heart failures; other numerous related heart disorders. NCI (National Cancer Institute) defines cancers as collection of serious diseases, when abnormal cells start-actively divide, and then spread to neighbouring tissues. Cancers may arise within various parts of human body resulting in a wide range of cancer types, in some cases spreading to other key parts of human body, through lymph and blood systems. CVDs along with cancers being 2 most major mortality reasons around the globe. Considering public system’s associated global risks, it makes sense to study bivariate statistics of both CVDs as well as cancers. This study considers public health and bio system, but it does not address health issues of individual patients. This study intends to contribute to contemporary public health and bio system research, but not clinical one; potential rationale could be an assessment of relative burdens on public health within various countries, given time-horizon of interest.
Statistical aspects of CVD and cancer diseases risks, along with the statistically relevant methods, has been receiving extensive attention in modern research community [1-14]; for CVD-related issues see [15-22]; for cancer-related issues see [33-51]. Generally, it being quite challenging to assess risk factors and reliability of realistic bio and health systems, as well as to predict excessive in situ mortality risks, under actual CVD and cancer disease conditions, using contemporary theoretical reliability and statistical methods [1,23-26]. Latter being typically due to bio and health system’s high-dimensionality, large number of random factors, governing bio and health system’s dynamics, yet system’s spread over extensive terrains. In general, reliability of complex biological, environmental or health systems may be assessed straightforwardly, by having either enough measurements, or by performing extensive direct MC (Monte Carlo) simulations. For CVD along with cancer diseases, available observational clinical datasets are often quite limited both in time, locations, as well as in dataset sizes, [1]. Given above-described challenges, authors have introduced here novel Gaida-Xing reliability methodology for bio and health systems, able to predict accurately CVD and cancer diseases excessive mortality risks. This study has been focused on CVD and cancer diseases spread dynamics over the globe [1,4,22], with special focus on complex cross correlations between various countries. Whole world has been hosen because of extensive public health observations, as well as widely available related research.
Reliability and statistics provide basis for lifetime modelling, typically using EVT (Extreme Value Theory), as a popular approach in both medical research and bio-medical engineering. In [15], authors proposed novel approach, utilizing PVF (Power Variance Function) copulas (Clayton, inverse Gaussian, Gumbel copulas, etc.), along with conditional sampling, used for survival analysis. For recent studies, discussing upper bounds distribution of life expectancy [16,19]. As there being not much statistical research done to assess contemporary risks associated with CVD and cancer diseases, including numerous heart and blood vessels diseases, authors introduced in this study novel reliability method, able to analyse excessive mortality episodes and their spatiotemporal spread, providing better indications and insight of possible future above-mentioned diseases spread. In our study CVD and cancer disease excessive mortality episodes have been viewed as unexpected/random incidents, that may possibly occur in any region/country, at any time-horizon, hence spatial spread being well accounted for. Non-dimensional scaling factor λ has been introduced, in order to predict latter CVD and cancer disease risks, given temporal return period and specific spatial area of interest.
Bio and health systems being subjected to various ergodic and cyclic environmental influences. Bio-process can be viewed as being intricately dependent on a number of bio/environmental parameters, whose temporal variation may be approximated/modelled, as ergodic or cyclic processes on their own.
The incidence dataset of CVD and cancer diseases within one hundred-ninety-five world countries, during three recent decades 1990-2019 has been od frobtainm a public website [1]. This clinical dataset being given per each world’s country, the bio or health system, under consideration may be regarded as MDOF (Multi Degree of Freedom) dynamic bio-system, having highly cross-correlated national/regional components (or dimensions). This study assists managing global risks of future CVD and cancer diseases excessive mortality episodes, by prognosticating them, and hence it is being only focused on annually globally registered patient death-numbers, not on disease symptoms. All world countries have been accounted for in this study [1] (Figure 1).
Although studied dataset being only limited cardio-cancer dataset, the proposed bio-reliability technique may well be applied for any well documented diseases.
Heart failure and cancer being the 2 main causes of death in developed countries. Even though these 2 clinical entities seem to be distinct, they really share many of the same risk factors, symptoms, and pathophysiological pathways (inflammation, activation of the immunological and neurohormonal systems, metabolic abnormalities, endothelial dysfunction, etc.). Along with the well-known cardiotoxic effects of onco and cardiac treatment, cancers and heart failure are assumed to be related by a bi-directional interaction, with one disease favoring the other and vice versa (Figure 2).
Consider MDOF public health or biological system’s vector (X(t),Y(t),Z(t),…), composed from bio or public health system critical/key components X(t),Y(t),Z(t),… that has been either observed, measured or numerically simulated over a sufficiently long (say, representative) observation time-period (0,T). Unidimensional (1D) bio or health system component’s global maxima denoted here as,
with sufficiently long (representative) time lapse T we basically mean large T, with respect to the bio-dynamic system autocorrelation, and relaxation times. Let X1,…,XNX be consequent in time local maxima of the biosystem component process X=X(t) at discrete, temporally increasing times
within (0,T). Identical type of definitions to follow for other MDOF components Y(t),Z(t),… Namely Y1,…,YN ); Z1,…,ZNZ etc. For the sake of simplicity, all bio or health system’s local maxima have been assumed to be positive. Then
being probability of dynamic biosystem’s survival, having critical (hazard) values of biosystem components, denoted as ηX, ηY, ηZ ,...; U being the logical unity operator «or»;
being biosystem’s joint PDF (probability density function) of individual biosystem component’s maxima. When biosystem’s NDOF (Number of Degrees of Freedom) being large, it being not always practically feasible to directly assess biosystem’s joint PDF
and hence biosystem’s probability P of survival. Latter probability P needs to be accurately assessed, as bio or health system’s expected lifetime, following Eq. (1). Bio-system’s 1D components X,Y,Z,… being re-scaled, as well as non-dimensionalized in a following manner
making all bio-system’s components non-dimensional, having now the same failure/hazard limits λ=1. Next, 1D biosystem’s components local maxima being merged into 1D temporally non-decreasing synthetic biosystem’s vector
following corresponding merged time-vector t1≤...≤tN, N≤NX+NY+NZ +... Every local maxima Rj being actually encountered by one of bio-system components: either X(t) or Y(t), or Z(t). Synthetic biosystem R vector has 0 data-loss, see Figure 3.
This section applies the Gaidai-Yakimov approach, [61] to a bivariate (2D) Random bio-process Z(t)=(X(t),Y(t)). This bivariate process being made up of annually registered patient death numbers, X(t),Y(t), for CVD and cancer-related diseases respectively, which have been monitored synchronously, within representative time-lapse (0,T). Assume that N equidistant discrete temporal intervals t1,…,tN within observational time-lapse were used to collect samples (data points) (X1,Y1),…,(XN,YN) within (0,T). In this study, we investigate bivariate joint CDF (Cumulative Distribution Function) P(ξ,η):= Prob (XN≤ξ,YN≤η) of the 2D vector (X ̂N,Y ̂N ), having components XN=max{Xj ;j=1,…,N} , and YN=max{Yj ;j=1,…,N}. The latter example serves as an illustration of a dynamic two-dimensional (2D) biosystem. Critical (hazard) Thresholds were determined, using unidimensional extreme bio or health system component’s global maxima with specific return periods and risks/probabilities p. Following scaling has been done to combine both clinical/measured/observed timeseries X,Y following Eq. (2), Resulting in each of 2 biosystem components having the same failure/hazard limits equal to 1. Hence, by maintaining assembled local maxima in temporally increasing order, all bio or health system component’s local maxima, from each observed/measured time-series, were combined into a single 1D system’s vector R =(max{X1,Y1 },…,max{X1,Y1 }).
Figure 4 presents an example of cardio-cancer data, given as time-series.
Figure 5 presents modified bivariate Weibull method’s bivariate (2D) contours for cardio-cancer dataset. It being seen from Figure 5 there being an intrinsic inaccuracy, owing to the specific copula choice within the modified bivariate Weibull method, as it fits several Gumbel copulas to the measured clinical dataset. For more information on the modified bivariate Weibull methods, see [5-9]. Selected test-point being located on the p=10-1.5 contour line, and it was accurately verified by the modified bivariate Weibull method, given the bivariate failure point η_X=44000, η_Y=65. Then, confirming estimate from GaidaiYakimov approach has been compared to the same probability level p=10-1.5, matching the latter 2D contour line. It was seen that Gaidai-Yakimov method's predicted 95% CI (Confidence Interval) for the modified bivariate Weibull method probability level estimate was indeed accurate, [10-14]. Note that the above-described methodology, while unique, has practical benefits of effectively utilizing measured/observed/simulated datasets, since it can handle biosystem’s multi-dimensionality, and conduct accurate extrapolation, based on relatively scarce raw clinical datasets.
Poincare plot being defined as SODP (Second Order Difference Plot). Time-series based dataset’s quality may be observed/checked statistically, using SODP, by analysing underlying dataset’s successive differences. SODP assists statistical pattern recongnition through consecutive differences in the underlying raw time-series data. CEM (Circled Entropy Measurement) method calculated data fall over circled regions, see Figure 6. MATLAB script for feature SODP models has been shared for practitioners [62].
Figure 6 presents 2nd order SODP plot, such type of plots may be used to identify raw clinical dataset patterns, and compare them with other datasets, for example within a framework of entropy-based artificial intelligence (AI) recognition methods [52]. Figure 6 highlights relationship between annual registered CVD and cancer patient recorded death numbers. As an example, SODP pattern, extracted from reduced dataset can be compared with SODP pattern, extracted from full dataset, and certain judgement about reduced da-taset representativity and data quality may be drawn. While current study presents the MDOF biosystem’s reliability study by sub-asymptotic method, the commonly used EVT being asymptotic and being only 1DOF.
Classic contemporary health systems reliability approaches do not have advantage of easily dealing with observed clinical time-series, originating from complex bio-systems with highdimensionality and non-linear cross correlations between various bio-system’s components. The key advantages of GaidaiYakimov method, proposed here, being its capacity to analyze reliability and risks of high-dimensional non-linear dynamic bio and public health systems [53-60]. Despite apparent simplicity, the current work presents essentially novel multi-dimensional bio-modelling methodology, along with methodological route to apply epidemic/chronical disease forecasting, while bio or health system has not yet reached its critical/hazard levels.
The advantages of working with highly dimensional systems and complicated cross-correlation between different bio-system components readily are not provided by traditional dependability methodologies for clinical time series. The primary advantage of the suggested technique is its ability to analyze dependability of high-dimensional dynamic systems. In this study, time series representing the rates of cardiovascular and cancer death from accessible clinical datasets were presented. The theoretical support for the proposed approach has been developed. The complexity and high dimensionality of dynamic bio and health systems require the development of novel, accurate, yet robust techniques that can handle the limited clinical datasets that are currently available and utilize clinical data as effectively as possible, despite the appeal of using direct measurements or extensive Monte Carlo simulations to analyze biosystem's reliability. Only for one-dimensional dynamic systems, the method proposed in this article has previously been shown useful when used with a range of simulation models. In general, forecasts turned out to be rather accurate. This study has focused on a multi-dimensional bio-reliability strategy that is general-purpose, trust worthy, and user-friendly. The suggested method can be applied in a number of biological, medicinal, and health sectors. The offered clinical example in no way limits the potential uses of the suggested technique.
Ethical approval: Not applicable.
Conflict of interests: Authors declare that they have no conflict of interest.
Author’s contribution: All authors contributed equally.
Funding: This study was implemented at the expense of a grant from the Russian Science Foundation (RSF) No 23-29- 00933 Development of a Probabilistic Simulation Model for Mechanical Interaction of Modern Ships with the Ice Cover to Ensure Safe Year-Round Arctic Navigation, https://rscf.ru/en/ project/23-29-00933/.
Data availability: Datasets used and analysed during the current study available from the corresponding author on reasonable request.