**Previous months:**

2010 - 1003(10) - 1004(7) - 1005(4) - 1006(1) - 1007(2) - 1008(4) - 1010(1) - 1011(1)

2011 - 1105(2) - 1107(1) - 1111(1) - 1112(1)

2012 - 1203(1) - 1204(2) - 1205(1) - 1208(1) - 1210(1) - 1211(6) - 1212(1)

2013 - 1301(2) - 1304(3) - 1306(2) - 1307(1) - 1310(2)

2014 - 1402(1) - 1403(3) - 1404(2) - 1405(2) - 1407(1) - 1409(4) - 1410(4) - 1411(13) - 1412(4)

2015 - 1503(1) - 1505(2) - 1506(2) - 1507(3) - 1508(3) - 1509(1) - 1511(3) - 1512(6)

2016 - 1601(6) - 1602(3) - 1603(4) - 1604(2) - 1605(1) - 1607(5) - 1608(2) - 1609(4) - 1610(1) - 1611(1) - 1612(2)

2017 - 1701(4) - 1702(3) - 1703(5) - 1704(12) - 1705(12) - 1706(8) - 1707(2) - 1708(2) - 1709(1) - 1710(4) - 1711(5) - 1712(6)

2018 - 1801(5) - 1802(3) - 1803(2)

Any replacements are listed farther down

[216] **viXra:1803.0399 [pdf]**
*submitted on 2018-03-21 22:17:16*

**Authors:** Vikas Ramachandra

**Comments:** 11 Pages.

In this paper, we propose the use of causal inference techniques for survival function estimation and prediction for subgroups of the data, upto individual units. Tree ensemble methods, specifically random forests were modified for this purpose. A real world healthcare dataset was used with about 1800 patients with breast cancer, which has multiple patient covariates as well as disease free survival days (DFS) and a death event binary indicator (y). We use the type of cancer curative intervention as the treatment variable (T=0 or 1, binary treatment case in our example).
The algorithm is a 2 step approach. In step 1, we estimate heterogeneous treatment effects using a causalTree with the DFS as the dependent variable. Next, in step 2, for each selected leaf of the causalTree with distinctly different average treatment effect (with respect to survival), we fit a survival forest to all the patients in that leaf, one forest each for treatment T=0 as well as T=1 to get estimated patient level survival curves for each treatment (more generally, any model can be used at this step).
Then, we subtract the patient level survival curves to get the differential survival curve for a given patient, to compare the survival function as a result of the 2 treatments.
The path to a selected leaf also gives us the combination of patient features and their values which are causally important for the treatment effect difference at the leaf.

**Category:** Statistics

[215] **viXra:1803.0245 [pdf]**
*submitted on 2018-03-17 13:13:16*

**Authors:** Ilija Barukčić

**Comments:** 28 Pages. pp. 28. Copyright © 2017 by Ilija Barukčić, Jever, Germany. Published by:

Objective: This systematic review assesses the causal relationship between Mycobac-terium avium subspecies paratuberculosis (MAP) and Crohn’s disease (CD).
Methods: A systematic review and meat-analysis of some impressive PCR based stud-ies is provided aimed to answer among other questions the following question. Is there a cause effect relationship between Mycobacterium avium subspecies paratuberculosis and Crohn’s disease? The method of the conditio per quam relationship was used to proof the hypothesis whether the presence of Mycobacterium avium subspecies paratuberculosis guarantees the presence of Crohn’s disease. In other words, if Crohn’s disease is present, then Mycobacterium avium subspecies paratuberculosis is present too. The mathematical formula of the causal relationship k was used to proof the hypothesis, whether there is a cause effect relationship between Mycobacterium avium subspecies paratuberculosis and Crohn’s disease. Significance was indicated by a p-value of less than 0.05.
Result: The studies analyzed (number of cases and controls N=1076) were able to pro-vide evidence that Mycobacterium avium subspecies paratuberculosis is a necessary condition (a conditio sine qua non) and sufficicent conditions of Crohn’s disease. Fur-thermore, the studies analyzed provide impressive evidence of a cause-effect relation-ship between Mycobacterium avium subspecies paratuberculosis and Crohn’s disease.
Conclusion: Mycobacterium avium subspecies paratuberculosis is the cause of Crohn’s disease.

**Category:** Statistics

[214] **viXra:1802.0150 [pdf]**
*submitted on 2018-02-12 07:47:03*

**Authors:** Abdelmajid Ben Hadj Salem

**Comments:** 47 Pages. In French.

It is a short lectures of Geostatistics giving some elements of this field for third-year students of the Geomatics license of the Faculty of Sciences of Tunis.

**Category:** Statistics

[213] **viXra:1802.0080 [pdf]**
*submitted on 2018-02-08 06:08:45*

**Authors:** Jesús Álvarez Lobo

**Comments:** 6 Pages. Journal Citation: arXiv:1602.03005v1

If a line cuts randomly two sides of a triangle, the length of the segment determined by the points of intersection is also random. The object of this study, applied to a particular case, is to calculate the probability that the length of such segment is greater than a certain value.

**Category:** Statistics

[212] **viXra:1801.0423 [pdf]**
*submitted on 2018-01-31 15:05:32*

**Authors:** Ilija Barukčić

**Comments:** 28 Pages. pp. 28. Copyright © 2017 by Ilija Barukčić, Jever, Germany. Published by:

Objective: Parvovirus B19 appears to be associated with several diseases, one among those appears to be systemic sclerosis. Still, there is no evidence of a causal link be-tween parvovirus B19 and systemic sclerosis.
Methods: To explore the cause-effect relationship between Parvovirus B19 and sys-temic sclerosis, a systematic review and re-analysis of studies available and suitable was performed. The method of the conditio sine qua non relationship was used to proof the hypothesis without Parvovirus B19 infection no systemic sclerosis. The mathematical formula of the causal relationship k was used to proof the hypothesis, whether there is a cause effect relationship between Parvovirus B19 and systemic sclerosis. Significance was indicated by a p-value of less than 0.05
Result: The data analyzed support the Null-hypothesis that without Parvovirus B19 infection no systemic sclerosis. In the same respect, the studies analyzed provide evi-dence of a (highly) significant cause effect relationship between Parvovirus B19 and systemic sclerosis.
Conclusion: This study supports the conclusion that Parvovirus B19 is the cause of systemic sclerosis.
Keywords
Parvovirus B19, systemic sclerosis, causal relationship

**Category:** Statistics

[211] **viXra:1801.0307 [pdf]**
*submitted on 2018-01-24 02:06:11*

**Authors:** Feng Zhang, Jianjun Wang, Wendong Wang, Jianwen Huang, Changan Yuan

**Comments:** 24 Pages.

In this paper, we propose a novel nonconvex penalty function for compressed sensing using integral convolution approximation. It is well known that an unconstrained optimization criterion based on $\ell_1$-norm easily underestimates the large component in signal recovery. Moreover, most methods either perform well only under the measurement matrix satisfied restricted isometry property (RIP) or the highly coherent measurement matrix, which both can not be established at the same time. We introduce a new solver to address both of these concerns by adopting a frame of the difference between two convex functions with integral convolution approximation. What's more, to better boost the recovery performance, a weighted version of it is also provided. Experimental results suggest the effectiveness and robustness of our methods through several signal reconstruction examples in term of success rate and signal-to-noise ratio (SNR).

**Category:** Statistics

[210] **viXra:1801.0171 [pdf]**
*submitted on 2018-01-14 16:15:21*

**Authors:** Ilija Barukčić

**Comments:** 37 pages. Copyright © 2018 by Ilija Barukčić, Horandstrasse, Jever, Germany. Published by:

Objective: Accumulating evidence indicates that the gut microbiome has an increas-ingly important role in human disease and health. Fusobacterium nucleatum has been identified in several studies as the leading gut bacterium which is present in colorectal cancer (CRC). Still it is not clear if Fusobacterium plays a causal role.
Methods: To explore the cause-effect relationship between Fusobacterium nucleatum and colorectal cancer, a systematic review and re-analysis of studies published was performed. The method of the conditio sine qua non relationship was used to proof the hypothesis without Fusobacterium nucleatum infection no colorectal cancer. The mathematical formula of the causal relationship k was used to proof the hypothesis, whether there is a cause effect relationship between Fusobacterium nucleatum and colorectal cancer. Significance was indicated by a p-value of less than 0.05
Result: The data analyzed support the Null-hypothesis that without Fusobacterium nucleatum infection no colorectal cancer. In the same respect, the studies analyzed provide highly significant cause effect relationship between Fusobacterium nucleatum and colorectal cancer.
Conclusion: The findings of this study suggest that Fusobacterium (nucleatum) is the cause of colorectal cancer.

**Category:** Statistics

[209] **viXra:1801.0148 [pdf]**
*submitted on 2018-01-12 23:14:43*

**Authors:** Robert Bennett

**Comments:** 14 Pages.

Biased statistics can arise from computational errors, belief in non-existent or unproven correlations ...or acceptance of premises proven invalid scientifically.
It is the latter that will be examined here for the case of human life expectancy, whose values are well-known...and virtually never challenged as to their basic assumptions.
Whether the false premises are accidental, a case of overlooking the obvious...or if they may be deliberate distortions serving a subliminal agenda.... is beyond the scope of this analysis.

**Category:** Statistics

[208] **viXra:1801.0045 [pdf]**
*submitted on 2018-01-04 23:57:59*

**Authors:** Jason Hou-Liu

**Comments:** Pages.

Latent Dirichlet Allocation (LDA) is a generative model describing the observed data as being composed of a mixture of underlying unobserved topics, as introduced by Blei et al. (2003). A key hyperparameter of LDA is the number of underlying topics *k*, which must be estimated empirically in practice. Selecting the appropriate value of *k* is essentially selecting the correct model to represent the data; an important issue concerning the goodness of fit. We examine in the current work a series of metrics from literature on a quantitative basis by performing benchmarks against a generated dataset with a known value of *k* and evaluate the ability of each metric to recover the true value, varying over multiple levels of topic resolution in the Dirichlet prior distributions. Finally, we introduce a new metric and heuristic for estimating kand demonstrate improved performance over existing metrics from the literature on several benchmarks.

**Category:** Statistics

[207] **viXra:1712.0504 [pdf]**
*submitted on 2017-12-18 15:14:48*

**Authors:** Samir Ait-Amrane

**Comments:** 12 Pages. In French

In this paper, we will explain some basic notions of statistics, first in the case of one variable, then in the case of two variables, while organizing ideas and drawing a parallel between some statistical and probabilistic formulas that are alike. We will also say a brief word about econometrics, time series and stochastic processes and provide some bibliographical references where these notions are explained clearly.

**Category:** Statistics

[206] **viXra:1712.0499 [pdf]**
*submitted on 2017-12-18 06:33:01*

**Authors:** Jason Lind

**Comments:** 2 Pages.

Development a novel closed form, for integer bounds, of Truncated Distribution and an application of that to a weighting function that favors values further from the origin.

**Category:** Statistics

[205] **viXra:1712.0429 [pdf]**
*submitted on 2017-12-14 01:05:01*

**Authors:** Cres Huang

**Comments:** Pages.

Viewing the random motions of objects, an observer might think it is 50-50 chances that an object would move toward or away. It might be intuitive, however, it is far from the truth. This study derives the probability functions of Doppler blueshift and redshift effect of signal detection.
The fact is, Doppler redshift detection is highly dominating in space, surface, and linear observation. Under the conditions of no quality loss of radiation over distance, and the observer has perfect vision; It is more than 92% probability of detecting redshift, in three-dimensional observation, 87% surface, and 75\% linear. In cosmic observation, only 7.81% of the observers in the universe will detect blueshift of radiations from any object, on average. The remaining 92.19% of the observers in the universe will detect redshift. It it universal for all observers, aliens or Earthlings at all locations of the universe.

**Category:** Statistics

[204] **viXra:1712.0244 [pdf]**
*submitted on 2017-12-07 05:24:57*

**Authors:** Luca Martino

**Comments:** 47 Pages.

Many applications in signal processing require the estimation of some parameters of interest given a set of observed data. More specifically, Bayesian inference needs the computation of a-posteriori estimators which are often expressed as complicated multi-dimensional integrals. Unfortunately, analytical expressions for these estimators cannot be found in most real-world applications, and Monte Carlo methods are the only feasible approach. A very powerful class of Monte Carlo techniques is formed by the Markov Chain Monte Carlo (MCMC) algorithms. They generate a Markov chain such that its stationary distribution coincides with the target posterior density. In this work, we perform a thorough review of MCMC methods using multiple candidates in order to select the next state of the chain, at each iteration. With respect to the classical Metropolis-Hastings method, the use
of multiple try techniques foster the exploration of the sample space. We present different Multiple Try Metropolis schemes, Ensemble MCMC methods, Particle Metropolis-Hastings
algorithms and the Delayed Rejection Metropolis technique. We highlight limitations, benefits, connections and dierences among the different methods, and compare them by numerical simulations.

**Category:** Statistics

[203] **viXra:1712.0110 [pdf]**
*submitted on 2017-12-04 22:02:28*

**Authors:** D Williams

**Comments:** 8 Pages.

A possible alternative (and non-standard) model of probability is presented based on non-standard "dx-less" integrals. The possibility of other such models is discussed.

**Category:** Statistics

[202] **viXra:1712.0018 [pdf]**
*submitted on 2017-12-03 02:40:26*

**Authors:** Ilija Barukčić

**Comments:** 50 pages. Copyright © 2017 by Ilija Barukčić, Horandstrasse, Jever, Germany. Published by:

Objective: Many times a positive relationship between Helicobacter pylori infection and gastric cancer has been reported, yet findings are inconsistent.
Methods: A literature search in PubMed was performed to re-evaluate the relationship between Helicobacter pylori (HP) and carcinoma of human stomach. Case control studies with a least 500 participants were consider for a review and meta-analysis. The meta-/re-analysis was conducted using conditio-sine qua non relationship and the causal relationship k. Significance was indicated by a p-value of less than 0.05.
Result: All studies analyzed provide impressive evidence of a cause effect relationship between H. pylori and gastric cancer (GC). Two very great studies were able to make the proof that H. pylori is a necessary condition of human gastric cancer. In other words, without H. pylori infection no human gastric cancer.
Conclusion: Our findings indicate that Helicobacter pylori (H. pylori) is the cause of gastric carcinoma.

**Category:** Statistics

[201] **viXra:1711.0437 [pdf]**
*submitted on 2017-11-26 08:58:37*

**Authors:** Ilija Barukčić

**Comments:** 57 pages. Copyright © 2017 by Ilija Barukčić, Horandstrasse, Jever, Germany. Published by:

Objective: A series of different studies detected Human papillomavirus (HPV) in malignant and nonmalignant prostate tissues. However, the results of studies on the relationship between HPV infections and prostate cancer (PCa) remain controversial.
Methods: A systematic review and re-analysis of some polymerase-chain reaction (PCR) based case-control studies was performed aimed to answer the following question: Is there a cause effect relationship between human papillomavirus (HPV) and prostatic cancer? The method of the conditio per quam relationship was used to proof the hypothesis: if presence of human papillomavirus (HPV) in human prostate tissues then presence of prostate carcinoma. The mathematical formula of the causal relationship k was used to proof the hypothesis, whether there is a cause effect relationship between human papillomavirus (HPV) and prostate cancer. Significance was indicated by a p-value of less than 0.05.
Result: Only one of the studies analyzed failed to provide evidence that there is a cause-effect relationship between human papillomavirus (HPV) and prostate cancer. Two studies were highly significant on this point. The majority of the studies analyzed support the hypothesis that human papillomavirus (HPV) is a sufficient condition of prostate cancer. In other words, if presence of human papillomavirus (HPV) in human prostate tissues then presence of prostate cancer.
Conclusion: Human papillomavirus (HPV) is a cause of prostate cancer.

**Category:** Statistics

[200] **viXra:1711.0339 [pdf]**
*submitted on 2017-11-18 05:44:37*

**Authors:** Ilija Barukčić

**Comments:** Pages.

Objective: Cervical cancer is the second most prevalent cancer in females worldwide. Infection with human papillomavirus (HPV) is regarded as the main risk factor of cervical cancer. Our objective was to conduct a qualitative systematic review of some case-control studies to examine the role of human papillomavirus (HPV) in the development of human cervical cancer beyond any reasonable doubt.
Methods: We conducted a systematic review and re-analysis of some impressive key studies aimed to answer the following question. Is there a cause effect relationship between human papillomavirus (HPV) and cervical cancer? The method of the conditio sine qua non relationship was used to proof the hypothesis whether the presence of human papillomavirus (HPV) guarantees the presence of cervical carcinoma. In other words, if human papillomavirus (HPV) is present, then cervical carcinoma is present too. The mathematical formula of the causal relationship k was used to proof the hypothesis, whether there is a cause effect relationship between human papillomavirus (HPV) and cervical carcinoma. Significance was indicated by a p-value of less than 0.05.
Result: One study was able to provide strict evidence that human papillomavirus (HPV) is a conditio sine qua non (a necessary condition) of cervical carcinoma while the other studies analyzed failed on this point. The studies analyzed provide impressive evidence of a cause-effect relationship between human papillomavirus (HPV) and cervical carcinoma.
Conclusion: Human papillomavirus (HPV) is the cause of cervical carcinoma.

**Category:** Statistics

[199] **viXra:1711.0326 [pdf]**
*submitted on 2017-11-15 10:17:45*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 2 Pages.

In this research investigation, the author has detailed a novel definition of Standard Deviation.

**Category:** Statistics

[198] **viXra:1711.0277 [pdf]**
*submitted on 2017-11-11 10:41:48*

**Authors:** Russell Leidich

**Comments:** 11 Pages.

Herein we present the “surround” function, which is intended to produce a set of “surround codes” which enhance the sparsity of integer sets which have discrete derivatives of lesser Shannon entropy than the sets themselves. In various cases, the surround function is expected to provide further entropy reduction beyond that provided by straightforward delta (difference) encoding alone.

We then present the simple concept of “densification”, which facilitates the elimination of entropy overhead due to masks (symbols) which were considered possible but do not actually occur in a given mask list (set of symbols).

Finally we discuss the ramifications of these techniques for the sake of enhancing the speed and sensitivity of various entropy scans.

[197] **viXra:1711.0248 [pdf]**
*submitted on 2017-11-08 10:41:26*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 3 Pages.

In this research investigation, the author has detailed a novel method of finding the ‘Total Intra Similarity And Dissimilarity Measure For The Values Taken By A Parameter Of Concern’. The advantage of such a measure is that using this measure we can clearly distinguish the contribution of Intra aspect variation and Inter aspect variation when both are bound to occur in a given phenomenon of concern. This measure provides the same advantages as that provided by the popular F-Statistic measure.

**Category:** Statistics

[196] **viXra:1710.0311 [pdf]**
*submitted on 2017-10-28 05:14:06*

**Authors:** Ilija Barukčić

**Comments:** 11 pages. Copyright © 2017 by Ilija Barukčić, Jever, Germany. Published by:

Background: The aim of this study is to work out a possible relationship between hu-man papillomavirus (HPV) and malignant melanoma.
Objectives: This systematic review and re-analysis of Roussaki-Schulze et al. availa-ble retrospective study of twenty-eight melanoma biopsy specimens and of the control group of 6 patients is performed so that some new inference can be drawn.
Materials and methods: Roussaki-Schulze et al. obtained data from twenty-eight human melanoma biopsy specimens and from six healthy individuals. The presence and types of HPV DNA within biopsy specimens was determined by polymerase chain reaction (PCR). Statistical Analysis: In contrast to Roussaki-Schulze et al., the meth-od of the conditio per quam relationship was used to proof the hypothesis that the presence of human papillomavirus (HPV) guarantees the presence of malignant mel-anoma. In other words, if human papillomavirus (HPV) is present, then malignant melanoma must also be present. The mathematical formula of the causal relationship k was used to proof the hypothesis, whether there is a cause effect relationship be-tween human papillomavirus (HPV) and malignant melanoma. Significance was indi-cated by a p-value of less than 0.05.
Results: Based on the data published by Roussaki-Schulze et al. we were able to make evidence that the presence of human papillomavirus (HPV) guarantees the presence of malignant melanoma. In other words, human papillomavirus (HPV) is a conditio per quam of malignant melanoma. Contrary to expectation, the data of Roussaki-Schulze et al. based on a very small sample size failed to provide significant evidence that human papillomavirus (HPV) is a cause or the cause of malignant melanoma.
Conclusions: Human papillomavirus (HPV) is a conditio per quam of malignant melanoma.

**Category:** Statistics

[195] **viXra:1710.0261 [pdf]**
*submitted on 2017-10-22 23:43:04*

**Authors:** Russell Leidich

**Comments:** 20 Pages. Released under the following license: https://creativecommons.org/licenses/by/4.0

The Jensen-Shannon divergence (JSD) quantifies the “information distance” between a pair of probability distributions. (A more generalized version, which is beyond the scope of this paper, is given in [1]. It extends this divergence to arbitrarily many such distributions. Related divergences are presented in [2], which is an excellent summary of existing work.)
A couple of novel applications for this divergence are presented herein, both of which involving sets of whole numbers constrained by some nonzero maximum value. (We’re primarily concerned with discrete applications of the JSD, although it’s defined for analog variables.) The first of these, which we can call the “Jensen-Shannon divergence transform” (JSDT), involves a sliding “sweep window” whose JSD with respect to some fixed “needle” is evaluated at each step as said window moves from left to right across a superset called a “haystack”.
The second such application, which we can call the “Jensen-Shannon exodivergence transform” (JSET), measures the JSD between a sweep window and an “exosweep”, that is, the haystack minus said window, at all possible locations of the latter. The JSET turns out to be exceptionally good at detecting anomalous contiguous subsets of a larger set of whole numbers.
We then investigate and attempt to improve upon the shortcomings of the JSD and the related Kullback-Leibler divergence (KLD).

**Category:** Statistics

[194] **viXra:1710.0243 [pdf]**
*submitted on 2017-10-22 16:38:41*

**Authors:** Paris Samuel Miles-Brenden

**Comments:** 3 Pages. None.

None.

**Category:** Statistics

[193] **viXra:1709.0422 [pdf]**
*submitted on 2017-09-28 22:11:39*

**Authors:** 张枫;王建军

**Comments:** 10 Pages.

压缩感知是(近似)稀疏信号处理的研究热点之一,它突破了Nyquist/Shannon采样率,实现了信号的高效采集和鲁棒重构.本文采用ℓ2/ℓ1极小化方法和Block D-RIP理论研究了在冗余紧框架下的块稀疏信号,所获结果表明,当Block D-RIP常数δ2k| 满足0 < δ2k| < 0.2时,ℓ2/ℓ1极小化方法能够鲁棒重构原始信号,同时改进了已有的重构条件和误差上限.基于离散傅里叶变换(DFT)字典,我们执行了一系列仿真实验充分地证实了理论结果.

**Category:** Statistics

[192] **viXra:1709.0359 [pdf]**
*submitted on 2017-09-23 21:26:31*

**Authors:** Jianwen Huang, Jianjun Wang, Wendong Wang

**Comments:** 12 Pages.

In this work, the sufficient condition for the recovery of block sparse signals that satisfy $b=\Phi x+\xi$ is investigated. We prove that every block s-sparse signal can be reconstructed by the $l_2/l_1$-minimization method in the noise-free situation and is stably reconstructed in the noisy measurement situation, if the sensing matrix fulfils the restricted isometry property with $\delta_{ts|\mathcal{I}}<t/(4-t)$ as $0<t<4/3$, $ts\geq2.$

**Category:** Statistics

[191] **viXra:1708.0118 [pdf]**
*submitted on 2017-08-11 06:30:03*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 1 Page.

In this research investigation, the author has detailed a Novel Technique of finding the Centroid of a given Set of Numbers in Any Prime Metric Basis of concern.

**Category:** Statistics

[190] **viXra:1708.0113 [pdf]**
*submitted on 2017-08-11 00:57:36*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 1 Page.

In this Technical Note, the author has detailed the evaluation of the Centroid of a given set of numbers in Prime Metric Basis.

**Category:** Statistics

[189] **viXra:1707.0315 [pdf]**
*submitted on 2017-07-24 15:57:44*

**Authors:** Zhicheng Chen

**Comments:** 6 Pages.

Data loss is a big problem in many online monitoring systems due to various reasons. Copula-based approaches are effective imputation methods for missing data imputation; however, such methods are highly dependent on a reliable distribution of missing data. This article proposed a functional regression approach for missing probability density function (PDF) imputation. PDFs are first transformed to a Hilbert space by the log quantile density (LQD) transformation. The transformed results of the response PDFs are approximated by the truncated Karhunen–Loève representation. Corresponding representation in the Hilbert space of a missing PDF is estimated by a vector-on-function regression model in reproducing kernel Hilbert space (RKHS), then mapping back to the density space by the inverse LQD transformation to obtain an imputation for the missing PDF. To address errors caused by the numerical integration in the inverse LQD transformation, original PDFs are aided by a PDF of uniform distribution. The effect of the added uniform distribution in the imputed result of a missing PDF can be separated by the warping function-based PDF estimation technique.

**Category:** Statistics

[188] **viXra:1707.0269 [pdf]**
*submitted on 2017-07-19 19:24:50*

**Authors:** James P. Long, Rafael S. De Souza

**Comments:** 9 Pages. To apper in Wiley StatsRef: Statistics Reference Online

We present a review of data types and statistical methods often encountered in astronomy. The aim is to provide an introduction to statistical applications in astronomy for statisticians and computer scientists. We highlight the complex, often hierarchical, nature of many astronomy inference problems and advocate for cross-disciplinary collaborations to address these challenges.

**Category:** Statistics

[187] **viXra:1706.0551 [pdf]**
*submitted on 2017-06-30 03:09:09*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 7 Pages.

In this research investigation, the author has presented a Recursive Past Equation and a Recursive Future Equation based on the Ananda-Damayanthi Normalized Similarity Measure considered to Exhaustion [1] (please see the addendum of [1] as well).

**Category:** Statistics

[186] **viXra:1706.0379 [pdf]**
*submitted on 2017-06-19 00:46:33*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 7 Pages.

In this research investigation, the author has presented a Recursive Past Equation and a Recursive Future Equation based on the Ananda-Damayanthi Normalized Similarity Measure considered to Exhaustion [1] (please see the addendum of [1] as well).

**Category:** Statistics

[185] **viXra:1706.0295 [pdf]**
*submitted on 2017-06-16 05:25:45*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 2 Pages.

In this research investigation, the author has presented an Advanced Forecasting Model.

**Category:** Statistics

[184] **viXra:1706.0279 [pdf]**
*submitted on 2017-06-12 12:58:17*

**Authors:** Raymond Gallucci

**Comments:** 10 Pages.

Since publication of NUREG/CR-6850 (EPRI 1011989), EPRI/NRC-RES Fire PRA Methodology for Nuclear Power Facilities in 2005, phenomenological modeling of fire growth to peak heat release rate (HRR) for electrical enclosure fires in nuclear power plant probabilistic risk assessment (PRA) has typically assumed an average 12-minute rise time. One previous analysis using the data from NUREG/CR-6850 from which this estimate derived (Gallucci, “Statistical Characterization of Cable Electrical Failure Temperatures Due to Fire, with Simulation of Failure Probabilities”) indicated that the time to peak HRR could be represented by a gamma distribution with alpha (shape) and beta (scale) parameters of 8.66 and 1.31, respectively. Completion of the test program by the US Nuclear Regulatory Commission (USNRC) for electrical enclosure heat release rates, documented in NUREG/CR-7197, Heat Release Rates of Electrical Enclosure Fires (HELEN-FIRE) in 2016, has provided substantially more data from which to characterize this growth time to peak HRR. From these, the author develops probabilistic distributions that enhance the original NUREG/CR-6850 results for both qualified (Q) and unqualified cables (UQ). The mean times to peak HRR are 13.3 and 10.1 min for Q and UQ cables, respectively, with a mean of 12.4 min when all data are combined, confirming that the original NUREG/CR-6850 estimate of 12 min was quite reasonable.
Via statistical-probabilistic analysis, the author shows that the time to peak HRR for Q and UQ cables can again be well represented by gamma distributions with alpha and beta parameters of 1.88 and 7.07, and 3.86 and 2.62, respectively. Working with the gamma distribution for All cables given the two cable types, the author performs simulations demonstrating that manual non-suppression probabilities, on average, are 30% and 10% higher than the use of a 12-min point estimate when the fire is assumed to be detected at its start and halfway between its start and the time it reaches its peak, respectively. This suggests that adopting a probabilistic approach enables more realistic modeling of this particular fire phenomenon (growth time).

**Category:** Statistics

[183] **viXra:1706.0191 [pdf]**
*submitted on 2017-06-15 02:21:37*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 10 Pages.

In this research investigation, the author has presented a Recursive Past Equation and a Recursive Future Equation based on the Ananda-Damayanthi Normalized Similarity Measure considered to Exhaustion [1] (please see the addendum of [1] as well).

**Category:** Statistics

[182] **viXra:1706.0190 [pdf]**
*submitted on 2017-06-15 03:22:48*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 8 Pages.

**Category:** Statistics

[181] **viXra:1706.0017 [pdf]**
*submitted on 2017-06-03 04:27:45*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 10 Pages.

In this research investigation, the author has presented a Recursive Past Equation and a Recursive Future Equation based on the Ananda-Damayanthi Normalized Similarity Measure considered to Exhaustion [1].

**Category:** Statistics

[180] **viXra:1705.0463 [pdf]**
*submitted on 2017-05-30 05:50:38*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 14 Pages.

In this research investigation, the author has presented a Recursive Past Equation and a Recursive Future Equation based on the Ananda-Damayanthi Normalized Similarity Measure considered to Exhaustion [1].

**Category:** Statistics

[179] **viXra:1705.0407 [pdf]**
*submitted on 2017-05-28 23:37:47*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 13 Pages.

In this research investigation, the author has presented a Recursive Past Equation and a Recursive Future Equation based on the Ananda-Damayanthi Normalized Similarity Measure considered to Exhaustion [1].

**Category:** Statistics

[178] **viXra:1705.0402 [pdf]**
*submitted on 2017-05-28 04:09:15*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 12 Pages.

**Category:** Statistics

[177] **viXra:1705.0396 [pdf]**
*submitted on 2017-05-28 01:50:06*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 8 Pages.

**Category:** Statistics

[176] **viXra:1705.0296 [pdf]**
*submitted on 2017-05-20 04:45:05*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 3 Pages.

**Category:** Statistics

[175] **viXra:1705.0128 [pdf]**
*submitted on 2017-05-07 09:59:15*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 2 Pages.

In this research investigation, the author has presented a Recursive Past Equation and a Recursive Future Equation based on the Ananda-Damayanthi Similarity Measure and its series considered to Exhaustion [1].

**Category:** Statistics

[174] **viXra:1705.0127 [pdf]**
*submitted on 2017-05-07 11:13:19*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 2 Pages.

In this research investigation, the author has presented a Recursive Past Equation and a Recursive Future Equation based on the Ananda-Damayanthi Similarity Measure considered to Exhaustion [1].

**Category:** Statistics

[173] **viXra:1705.0106 [pdf]**
*submitted on 2017-05-05 03:43:32*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 2 Pages.

In this research investigation, the author has presented a Recursive Future Equation based on the Ananda-Damayanthi Normalized Similarity Measure [1].

**Category:** Statistics

[172] **viXra:1705.0098 [pdf]**
*submitted on 2017-05-03 13:47:48*

**Authors:** G. Healey, S. Zhao, D. Brooks

**Comments:** 6 Pages.

Given the speed and movement for pitches thrown by a set of pitchers, we develop a measure of pitcher similarity.

**Category:** Statistics

[171] **viXra:1705.0097 [pdf]**
*submitted on 2017-05-03 13:55:04*

**Authors:** G. Healey, S. Zhao, D. Brooks

**Comments:** 7 Pages.

Tables of the most similar pitcher matches for 2016.

**Category:** Statistics

[170] **viXra:1705.0093 [pdf]**
*submitted on 2017-05-04 04:36:02*

**Authors:** L. Martino

**Comments:** 7 Pages.

Monte Carlo (MC) methods have become very popular in signal processing during the past decades. The adaptive rejection sampling (ARS) algorithms are well-known MC technique which draw efficiently independent samples from univariate target densities. The ARS schemes yield a sequence of proposal functions that converge toward the target, so that the probability of accepting a sample approaches one. However, sampling from the proposal pdf becomes more computational demanding each time it is updated. We propose the Parsimonious Adaptive Rejection Sampling (PARS) method, where a better trade-off
between acceptance rate and proposal complexity is obtained. Thus, the resulting algorithm is faster than the standard ARS approach.

**Category:** Statistics

[169] **viXra:1705.0037 [pdf]**
*submitted on 2017-05-03 11:21:26*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 2 Pages.

In this research investigation, the author has presented a Recursive Past Equation. Also, a Recursive Future Equation is presented.

**Category:** Statistics

[168] **viXra:1704.0383 [pdf]**
*submitted on 2017-04-28 23:11:55*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 4 Pages.

In this research investigation, the author has presented two Forecasting Models

**Category:** Statistics

[167] **viXra:1704.0382 [pdf]**
*submitted on 2017-04-28 23:39:19*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 6 Pages.

In this research investigation, the author has presented two Forecasting Models.

**Category:** Statistics

[166] **viXra:1704.0371 [pdf]**
*submitted on 2017-04-27 23:10:31*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 2 Pages.

In this research investigation, the author has presented a Forecasting Model

**Category:** Statistics

[165] **viXra:1704.0370 [pdf]**
*submitted on 2017-04-27 23:45:03*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 3 Pages.

In this research investigation, the author has presented an Advanced Forecasting Model.

**Category:** Statistics

[164] **viXra:1704.0368 [pdf]**
*submitted on 2017-04-28 02:54:24*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 3 Pages.

In this research investigation, the author has presented an Advanced Forecasting Model.

**Category:** Statistics

[163] **viXra:1704.0344 [pdf]**
*submitted on 2017-04-26 06:16:55*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 4 Pages.

In this research investigation, the author has presented two forecasting models.

**Category:** Statistics

[162] **viXra:1704.0332 [pdf]**
*submitted on 2017-04-24 23:04:13*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 2 Pages.

In this research investigation, the author has presented two one step forecasting models.

**Category:** Statistics

[161] **viXra:1704.0314 [pdf]**
*submitted on 2017-04-24 04:51:58*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 2 Pages.

In this research investigation, the author has presented two Forecasting Models.

**Category:** Statistics

[160] **viXra:1704.0277 [pdf]**
*submitted on 2017-04-22 03:09:39*

**Authors:** Zhicheng Chen

**Comments:** 4 Pages.

Distributions play a very important role in many applications. Inspired by the newly developed warping transformation of distributions, an indirect nonparametric distribution to distribution regression method is proposed in this article for predicting correlated one-dimensional continuous probability density functions.

**Category:** Statistics

[159] **viXra:1704.0247 [pdf]**
*submitted on 2017-04-19 19:12:16*

**Authors:** Zhicheng Chen, Hui Li

**Comments:** 7 Pages.

In Structural Health Monitoring, there are usually many strain sensors installed in different places of a single structure. The raw measurement of a strain sensor is generally a mixed response caused by different excitations such as moving vehicle loads, ambient temperature, etc. Monitoring data collected by different strain sensors are usually correlated with each other, correlation structures of responses caused by different excitations for different sensor pairs are quite diverse and complex. In Structural Health Monitoring, quantitatively describing and modeling complicated dependence structures of strain data is very important in many applications. In this article, copulas are exploited to characterize dependence structures and construct joint distributions of monitoring strain data. The constructed joint distribution is also applied in missing data imputation.

**Category:** Statistics

[158] **viXra:1704.0246 [pdf]**
*submitted on 2017-04-19 20:55:22*

**Authors:** R. Sharma, R. Bhandari

**Comments:** 5 Pages.

It is shown that the formula for the variance of combined series yields surprisingly simple proofs of some well known variance bounds.

**Category:** Statistics

[157] **viXra:1704.0063 [pdf]**
*submitted on 2017-04-05 12:22:14*

**Authors:** L. Martino, V. Elvira, G. Camps-Valls

**Comments:** 32 Pages. Related Matlab demos at https://github.com/lukafree/GIS.git

Importance Sampling (IS) is a well-known Monte Carlo technique that approximates integrals involving a posterior distribution by means of weighted samples. In this work, we study the assignation of a single weighted sample which compresses the information contained in a population of weighted samples. Part of the theory that we present as Group Importance Sampling (GIS) has been already employed implicitly in different works in literature. The provided analysis yields several theoretical and practical consequences. For instance, we discuss the application of GIS into the Sequential Importance Resampling (SIR) framework and show that Independent Multiple Try Metropolis (I-MTM) schemes can be interpreted as a standard Metropolis-Hastings algorithm, following the GIS approach. We also introduce two novel Markov Chain Monte Carlo (MCMC) techniques based on GIS. The first one, named Group Metropolis Sampling (GMS) method, produces a Markov chain of sets of weighted samples. All these sets are then employed for obtaining a unique global estimator. The second one is the Distributed Particle Metropolis-Hastings (DPMH) technique, where different parallel particle filters are jointly used to drive an MCMC algorithm.
Different resampled trajectories are compared and then tested with a proper acceptance probability. The novel schemes are tested in different numerical experiments and compared with several benchmark Monte Carlo techniques. Three descriptive Matlab demos are also provided.

**Category:** Statistics

[156] **viXra:1703.0217 [pdf]**
*submitted on 2017-03-23 04:27:14*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 2 Pages.

In this research article, the author has detailed a Novel Scheme of Clustering Based on Natural Metric.

**Category:** Statistics

[155] **viXra:1703.0203 [pdf]**
*submitted on 2017-03-21 08:07:21*

**Authors:** Raymond HV Gallucci

**Comments:** 82 Pages.

Situational Underlying Value (SUV) arose from an attempt to develop an all-encompassing statistic for measuring “clutchiness” for individual baseball players. It was to be based on the “run expectancy” concept, whereby each base with a certain number of outs is “worth” some fraction of a run. Hitters/runners reaching these bases would acquire the “worth” of that base, with the “worth” being earned by the hitter if he reached a base or advanced a runner, or the runner himself if he advanced “on his own” (e.g., stolen base, wild pitch). After several iterations, the version for SUV Baseball presented herein evolved, and it is demonstrated via two games. Subsequently, the concept was extended to professional football and NCAA Men’s Basketball, both with two example games highlighting selected individual players. As with Major League Baseball, these are team games where individual performance may be hard to gauge with a single statistic. This is the goal of SUV, which can be used as a measure both for the team and individual players.

**Category:** Statistics

[154] **viXra:1703.0163 [pdf]**
*submitted on 2017-03-16 12:28:32*

**Authors:** Glenn Healey

**Comments:** 24 Pages.

The deployment of sensors that characterize the trajectory of pitches and batted balls in three dimensions provides the opportunity to assign an intrinsic value to a pitch that depends on its physical properties and not on its observed outcome. We exploit this opportunity by utilizing a Bayesian framework to map five-dimensional PITCHf/x
velocity, movement, and location vectors to pitch intrinsic values. HITf/x data is used by
the model to obtain intrinsic quality-of-contact values for batted balls that are invariant to the defense, ballpark, and atmospheric conditions. Separate mappings are built to accommodate the effects of count and batter/pitcher handedness. A kernel method is used to generate nonparametric estimates for the component probability density functions in Bayes theorem while cross-validation enables the model to adapt to the size and structure of the data.

**Category:** Statistics

[153] **viXra:1703.0041 [pdf]**
*submitted on 2017-03-05 02:04:12*

**Authors:** Malte Braband

**Comments:** 16 pages, 4 figures, 2 tables, in German. Jugend Forscht project 130423.

This paper analyses the question how to systematically reach the top flight of soccer prediction leagues. In a first step several forecast models are compared and it is shown how most models can be related to the Poisson model. Some of the relations are new. Additionally a method has been developed which allows to numerically evaluate the outcome probabilities of soccer championships instead of simulation. The main practical result for the example of the 2014 soccer World Championship was that the forecast models were significantly better than the human participants of a large public prediction league. However the results between the forecast models were small, both qualitatively and quantitatively. But it is quite unlikely that a large prediction league will be won by a forecast model although the forecast models almost all belonged to the top flight of the prediction league.

**Category:** Statistics

[152] **viXra:1703.0010 [pdf]**
*submitted on 2017-03-01 14:12:42*

**Authors:** Raymond H.V. Gallucci

**Comments:** 48 Pages.

SUV – Situational Underlying Value – for professional baseball (MLB) is a concept based on the more traditional one of “run expectancy.” This is a statistical estimate of the number of runs expected to result from a base runner or multiple runners given his/their presence at a particular base, or bases, and the number of outs in an inning. Numerous baseball websites discuss this concept; one can find dozens more with a simple internet search on “run expectancy.”
SUV for professional football (NFL) is not as readily conceived as that for baseball, although the concept of each position on the field with down and yards to go has been examined for the possibility of assigning point values (from here on referred to as SUVs). Quantification of this concept is taken from “Expected Points and Expected Points Added Explained,” by Brian Burke, December 7, 2014.
Example applications to a pair of professional baseball games (MLB) and pair of professional football games (NFL) are included that illustrate how the SUV is used.

**Category:** Statistics

[151] **viXra:1702.0308 [pdf]**
*submitted on 2017-02-24 22:04:43*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 9 Pages.

In this research investigation, the author has presented a Novel Forecasting Model based on Locally Linear Transformations, Element Wise Inner Product Mapping, De-Normalization of the Normalized States for predicting the next instant of a Dynamical State given its sufficient history is known.

**Category:** Statistics

[150] **viXra:1702.0294 [pdf]**
*submitted on 2017-02-23 22:02:28*

**Authors:** Ramesh Chandra Bagadi

**Comments:** 5 Pages.

In this research investigation, a Statistical Algorithm is detailed that enables us to pick a Least Biased Random Sample of Size n , from a Data Set of N Points with n<N.

**Category:** Statistics

[149] **viXra:1702.0243 [pdf]**
*submitted on 2017-02-19 11:36:41*

**Authors:** Stephen P. Smith

**Comments:** 26 Pages.

Automatic differentiation is a powerful collection of software tools that are invaluable in many areas including statistical computing. It is well known that automatic differentiation techniques can be applied directly by a programmer in a process called hand coding. However, the advantages of hand coding with certain applications are less appreciated, but these advantages are of paramount importance to statistics in particular. Based on the present literature, the variance component problem using restricted maximum likelihood is an example where hand coding derivatives was very useful relative to automatic or algebraic approaches. Some guidelines for hand coding backward derivatives are also provided, and emphasis is given to techniques for reducing space complexity and computing second derivatives.

**Category:** Statistics

[148] **viXra:1701.0420 [pdf]**
*submitted on 2017-01-10 13:45:13*

**Authors:** Nikhil Shaw

**Comments:** 8 Pages.

In computer science, a selection algorithm is an algorithm for finding the kth smallest number in a list or array; such a number is called the kth order statistic. This includes the cases of finding the minimum, maximum, and median elements. There are O(n) (worst-case linear time) selection algorithms, and sublinear performance is possible for structured data; in the extreme, O(1) for an array of sorted data. Selection is a subproblem of more complex problems like the nearest neighbor and shortest path problems. Many selection algorithms are derived by generalizing a sorting algorithm, and conversely some sorting algorithms can be derived as repeated application of selection.
This new algorithm although has worst case of O(n^2), the average case is of near linear time for an unsorted list.

**Category:** Statistics

[147] **viXra:1701.0325 [pdf]**
*submitted on 2017-01-08 03:55:27*

**Authors:** Ilija Barukčić

**Comments:** 29 Pages. Copyright © 2017 by Ilija Barukčić, Jever, Germany. Published by:

Epstein-Barr Virus (EBV) has been widely proposed as a possible candidate virus for the viral etiology of human breast cancer, still the most common malignancy affecting females worldwide. Due to possible problems with PCR analyses (contamination), the lack of uniformity in the study design and insufficient mathematical/statistical methods used by the different authors, findings of several EBV (polymerase chain reaction (PCR)) studies contradict each other making it difficult to determine the EBV etiology for breast cancer. In this present study, we performed a re-investigation of some of the known studies. To place our results in context, this study support the hypothesis that EBV is a cause of human breast cancer.

**Category:** Statistics

[146] **viXra:1701.0296 [pdf]**
*submitted on 2017-01-05 13:34:46*

**Authors:** Ilija Barukčić

**Comments:** 7 Pages. Copyright © 2017 by Ilija Barukčić, Jever, Germany. Published by: Journal of Biosciences and Medicines, 2018, Vol.6 No.1, pp. 75-100.

Epstein-Barr virus (EBV), a herpes virus which persists in memory B cells in the peripheral blood for the lifetime of a person, is associated with some malignancies. Many studies suggested that the Epstein-Barr virus contributes to the development of Hodgkin's lymphoma (HL) in some cases too. Despite intensive study, the role of Epstein-Barr virus in Hodgkin's lymphoma remains enigmatic. It is the purpose of this publication to make the proof the Epstein-Barr virus is a main cause of Hodgkin’s lymphoma (k=+0,739814235, p Value = 0,000000000000138).

**Category:** Statistics

[145] **viXra:1701.0009 [pdf]**
*submitted on 2017-01-02 10:00:30*

**Authors:** Ilija Barukčić

**Comments:** 9 Pages. Copyright © 2016 by Ilija Barukčić, Jever, Germany. Published by Journal of Biosciences and Medicines, Vol.5 No. 2, p. 1-9. https://doi.org/10.4236/jbm.2017.52001

Background: Many studies documented an association between a Helicobacter pylori infection and the development of human gastric cancer. None of these studies were able to identify Helicobacter pylori as a cause or as the cause of human gastric cancer. The basic relation between gastric cancer and Helicobacter pylori still remains uncer-tain.
Objectives: This systematic review and re-analysis of Naomi Uemura et al. available long-term, prospective study of 1526 Japanese patients is performed so that some new and meaningful inference can be drawn.
Materials and Methods: Data obtained by Naomi Uemura et al. who conducted a long-term, prospective study of 1526 Japanese patients with a mean follow up about 7.8 years and endoscopy at enrolment and in the following between one and three years after enrolment were re-analysed.
Statistical analysis used:
The method of the conditio sine qua non relationship was used to proof the hypothesis without a helicobacter pylori infection no development of human gastric cancer. The mathematical formula of the causal relationship was used to proof the hypothesis, whether there is a cause effect relationship between a helicobacter pylori infection and human gastric cancer. Significance was indicated by a P value of less than 0.05.
Results:
Based on the data published by Uemura et al. we were able to make evidence that without a helicobacter pylori infection no development of human gastric cancer. In other words, a Helicobacter pylori infection is a conditio sine qua non of human gastric cancer. In the same respect, the data of Uemura et al. provide a significant evidence that a helicobacter pylori infection is the cause of human gastric cancer.
Conclusions:
Without a Helicobacter pylori infection no development of human gastric cancer. Hel-icobacter pylori is the cause (k=+0,07368483, p Value = 0.00399664) of human gastric cancer.

**Category:** Statistics

[144] **viXra:1612.0361 [pdf]**
*submitted on 2016-12-28 07:08:27*

**Authors:** Ilija Barukčić

**Comments:** 5 Pages. Copyright © 2016 by Ilija Barukčić, Jever, Germany.Published by:

Objective.
Many studies presented some evidence that EBV might play a role in the pathogenesis of rheumatoid arthritis. Still, there are conflicting reports concerning the existence of EBV in the synovial tissue of patients suffering from rheumatoid arthritis.
Material and methods.
Takeda et al. designed a study to detected EBV DNA is synovial tissues obtained at synovectomy or arthroplasty from 32 patients with rheumatoid arthritis (RA) and 30 control patients (no rheumatoid arthritis). In this study, the data as published by Takeda et al. were re-analysed.
Results.
EBV infection of human synovial tissues is a condition per quam of rheumatoid arthritis. And much more than this. There is a highly significant causal relationship between an EBV infection of human synovial tissues and rheumatoid arthritis (k= +0,546993718, p-value = 0,00001655).
Conclusion.
These findings suggest that EBV infection of human synovial tissues is a main cause of rheumatoid arthritis.

**Category:** Statistics

[143] **viXra:1612.0240 [pdf]**
*submitted on 2016-12-14 04:46:37*

**Authors:** M. J. Germuska

**Comments:** 6 Pages.

This paper analyses the data for the masses of elementary particles provided by the Particles Data Group (PDG). It finds evidence that the best mass estimates are not based solely on statistics but also on overall consistency, that sometimes results in skewed minimum and maximum mass limits. The paper also points out to some other quirks that result in minimum and maximum mass limits which are far from the statistical standard deviation. A statistical method is proposed to compute the standard deviation in such cases and when PDG does not provide any limits.

**Category:** Statistics

[142] **viXra:1611.0037 [pdf]**
*submitted on 2016-11-03 07:20:40*

**Authors:** Minyu Feng, Hong Qu, Zhang Yi, Jürgen Kurths

**Comments:** 11 pages

During the last decades, Power-law distributions played significant roles in analyzing the topology of scale-free (SF) networks. However, in the observation of degree distributions of practical networks and other unequal distributions such as wealth distribution, we uncover that, instead of monotonic decreasing, there exists a peak at the beginning of most real distributions, which cannot be accurately described by a Power-law. In this paper, in order to break the limitation of the Power-law distribution, we provide detailed derivations of a novel distribution called Subnormal distribution from evolving networks with variable
elements and its concrete statistical properties. Additionally, imulations of fitting the subnormal distribution to the degree distribution of evolving networks, real social network, and
personal wealth distribution are displayed to show the fitness of proposed distribution.

**Category:** Statistics

[141] **viXra:1610.0010 [pdf]**
*submitted on 2016-10-01 17:00:34*

**Authors:** Marisol García-Peña, Sergio Arciniegas-Alarcón, Wojtek Krzanowski, Décio Barbin

**Comments:** 15 Pages.

GabrielEigen is a simple deterministic imputation system without structural or distributional assumptions, which uses a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition. We provide multiple imputation alternatives (MI) based on this system, by adding random quantities and generating approximate confidence intervals with different widths to the imputations using cross-validation (CV). These methods are assessed by a simulation study using real data matrices in which values are deleted randomly at different rates, and also in a case where the missing observations have a systematic pattern. The quality of the imputations is evaluated by combining the variance between imputations (Vb) and their mean squared deviations from the deleted values (B) into an overall measure (Tacc). It is shown that the best performance occurs when the interval width matches the imputation error associated with GabrielEigen.

**Category:** Statistics

[140] **viXra:1609.0230 [pdf]**
*submitted on 2016-09-15 04:35:44*

**Authors:** L. Martino, V. Elvira, G. Camps-Valls

**Comments:** 13 Pages.

Monte Carlo methods are essential tools for Bayesian inference. Gibbs sampling is a well-known Markov chain Monte Carlo (MCMC) algorithm, extensively used in statistical signal processing, machine learning and statistics, employed to draw samples from complicated high-dimensional posterior distributions. The key point for the successful application of the Gibbs sampler is the ability to draw efficiently from the full-conditional pdfs. In the general case, this is not possible and it requires the generation of auxiliary samples that are wasted, since they are not used in the final estimators. In this work, we show that these auxiliary samples can be employed within the Gibbs estimators, improving their efficiency with no extra cost. This novel scheme arises naturally after pointing out the relationship between the Gibbs sampler and the chain rule used for sampling purpose. Numerical simulations confirm the excellent performance of the novel scheme.

**Category:** Statistics

[139] **viXra:1609.0215 [pdf]**
*submitted on 2016-09-14 01:54:49*

**Authors:** editors Sachin Malik, Neeraj Kumar, Florentin Smarandache

**Comments:** 90 Pages.

The main aim of the present book is to suggest some improved estimators using auxiliary and attribute information in case of simple random sampling and stratified random sampling and some inventory models related to capacity constraints.
This volume is a collection of five papers, written by six co-authors (listed in the order of the papers): Dr. Rajesh Singh, Dr. Sachin Malik, Dr. Florentin Smarandache, Dr. Neeraj Kumar, Mr. Sanjey Kumar & Pallavi Agarwal.
In the first chapter authors suggest an estimator using two auxiliary variables in stratified random sampling for estimating population mean.
In second chapter they proposed a family of estimators for estimating population means using known value of some population parameters.
In Chapter third an almost unbiased estimator using known value of some population parameter(s) with known population proportion of an auxiliary variable has been used.
In Chapter four the authors investigates a fuzzy economic order quantity model for two storage facility. The demand, holding cost, ordering cost, storage capacity of the own - warehouse are taken as trapezoidal fuzzy numbers.
In Chapter five a two-warehouse inventory model deals with deteriorating items, with stock
dependent demand rate and model affected by inflation under the pattern of time value of
money over a finite planning horizon. Shortages are allowed and partially backordered
depending on the waiting time for the next replenishment. The purpose of this model is to
minimize the total inventory cost by using the genetic algorithm.
This book will be helpful for the researchers and students who are working in the field of sampling techniques and inventory control.

**Category:** Statistics

[138] **viXra:1609.0210 [pdf]**
*submitted on 2016-09-13 11:02:40*

**Authors:** Russell Leidich

**Comments:** 13 Pages. This work is licensed under a Creative Commons Attribution 4.0 International License.

Unlike other common transcendental functions such as log and sine, James Stirling's convergent series for the loggamma (“logΓ”) function suggests no obvious method by which to ascertain meaningful bounds on the error due to truncation after a particular number of terms. (“Convergent” refers to the fact that his original formula appeared to converge, but ultimately diverged.) As such, it remains an anathema to the interval arithmetic algorithms which underlie our confidence in its various numerical applications.
Certain error bounds do exist in the literature, but involve branches and procedurally generated rationals which defy straightforward implementation via interval arithmetic.
In order to ameliorate this situation, we derive error bounds on the loggamma function which are readily amenable to such methods.

**Category:** Statistics

[137] **viXra:1609.0145 [pdf]**
*submitted on 2016-09-11 14:55:29*

**Authors:** A. A. Salama, Rafif alhbeib

**Comments:** 13 Pages.

تكمن أهمية البحث في الوصول إلى آفاق جديدة في نظرية الاحتمالات سندعوها نظرية الاحتمالات الكلاسيكية النتروسوفيكية وضع أسسها أحمد سلامة وفلورنتن سمارنداكة والتي تنتج عن تطبيق المنطق النتروسوفيكي على نظرية الاحتمالات الكلاسيكية , ولقد عرف سلامة وسمارانداكه الفئة النتروسوفيكية الكلاسيكية بثلاث مكونات جزئية من الفئة الشاملة الكلاسيكية ( فضاء العينة) وثلاث مكونات من الفئة الفازية هي الصحة والخطأ والحياد (الغموض) وإمتداد لمفاهيم سلامة وسمارنداكة سنقوم بدراسة احتمال هذه الفئات الجديدة واستنتاج الخصائص لهذا الاحتمال ومقارنته مع الاحتمال الكلاسيكي
ولابد أن نذكر أنه يمكن لهذه الأفكار أن تساعد الباحثين وتقدم لهم استفادة كبرى في المستقبل في إيجاد خوارزميات جديدة لحل مشاكل دعم القرار .
مشكلة البحث:
لقد وضع تطور العلوم أمام نظرية الاحتمالات عدداً كبيراً من المسائل الجديدة غير المفسرة في إطار النظرية الكلاسيكية ولم تكن لدى نظرية الاحتمالات طرق عامة أو خاصة تفسر الظواهر الجارية في زمن ما بشكل دقيق فكان لابد من توسيع بيانات الدراسة وتوصيفها بشكل دقيق لنحصل على احتمالات أكثر واقعية واتخاذ قرارات أكثر صوابية وهنا جاء دور المنطق النتروسوفيكي الذي قدم لنا نوعبن من الفئات النتروسوفيكية التي تعمم المفهوم الضبابي والمفهوم الكلاسيكي للفئات والاحداث التي تعتبر اللبنة الأولى في دراسة الاحتمالات النتروسوفيكية .
أهداف البحث:
تهدف هذه الدراسة إلى :
1-تقديم وعرض لنظرية الفئات النتروسوفيكية من النوع الكلاسيكي والنوع الفازي .
2-تقديم وتعريف الاحتمال النتروسوفيكي للفئات النتروسوفيكية .
3-بناء أدوات لتطوير الاحتمال النتروسوفيكي ودراسة خصائصه .
4-تقديم التعاريف والنظريات الاحتمالية وفق المنطق النتروسوفيكي الجديد .
5-مقارنة ما تم التوصل إليه من نتائج باستخدام الاحتمال النيتروسوفكي Neutrosophic probability بالاحتمال الكلاسيكي .
6-نتائج استخدام الاحتمالات النتروسوفيكية على عملية اتخاذ القرار .

**Category:** Statistics

[136] **viXra:1608.0403 [pdf]**
*submitted on 2016-08-30 03:06:12*

**Authors:** Sascha Vongehr

**Comments:** 6 Pages. 2 Figures

Ashkenazim Jews (AJ) comprise roughly 30% of Nobel Prize winners, ‘elite institute’ faculty, etc. Mean AJ intelligence quotients (IQ) fail explaining this, because AJ are only 2.2% of the US population. The growing anti-Semitic right wing supports conspiracy theories with this. However, deviations depend on means. This lifts the right wing of the AJ IQ distribution. Alternative mechanisms such as intellectual AJ culture or in-group collaboration, even if real, must be regarded as included through their IQ-dependence. Antisemitism is thus opposed in its own domain of discourse; it is an anti-intelligence position inconsistent with eugenics.

**Category:** Statistics

[135] **viXra:1608.0152 [pdf]**
*submitted on 2016-08-15 04:41:51*

**Authors:** Tahsin Olgu Benli, Hatice Sengul

**Comments:** 10 Pages.

It is very vital for suppliers and distributors to predict the deregulated electricity prices for creating their bidding strategies in the competitive market area. Pre requirement of succeeding in this field, accurate and suitable electricity tariff price forecasting tools are needed. In the presence of effective forecasting tools, taking the decisions of production, merchandising, maintenance and investment with the aim of maximizing the profits and benefits can be successively and effectively done. According to the electricity demand, there are four various electricity tariffs pricing in Turkey; monochromic, day, peak and night. The objective is find the best suitable tool for predicting the four pricing periods of electricity and produce short term forecasts (one year ahead-monthly). Our approach based on finding the best model, which ensures the smallest forecasting error measurements of; MAPE, MAD and MSD. We conduct a comparison of various forecasting approaches in total accounts for nine teen, at least all of those have different aspects of methodology. Our beginning step was doing forecasts for the year 2015. We validated and analyzed the performance of our best model and made comparisons to see how well the historical values of 2015 and forecasted data for that specific period matched. Results show that given the time-series data, the recommended models provided good forecasts. Second part of practice, we also include the year 2015, and compute all the models with the time series of January 2011 – December 2015. Again by choosing the best appropriate forecasting model, we conducted the forecast process and also analyze the impact of enhancing of time series periods (January, 2007 to December, 2015) to model that we used for forecasting process.

**Category:** Statistics

[134] **viXra:1607.0526 [pdf]**
*submitted on 2016-07-27 14:23:22*

**Authors:** Sergio Arciniegas-Alarcón, Marisol García-Peña, Wojtek Krzanowski

**Comments:** 9 Pages.

We propose a new methodology for multiple imputation when faced with missing data in multi-environmental trials with genotype-by-environment interaction, based on the imputation system developed by Krzanowski that uses the singular value decomposition (SVD) of a matrix. Several different iterative variants are described; differential weights can also be included in each variant to represent the influence of different components of SVD in the imputation process. The methods are compared through a simulation study based on three
real data matrices that have values deleted randomly at different percentages, using as measure of overall accuracy a combination of the variance between imputations and their mean square deviations relative to the deleted values. The best results are shown by two of the iterative schemes that use weights belonging to the interval [0.75, 1]. These schemes provide imputations that have higher quality when compared with other multiple imputation methods based on the Krzanowski method.

**Category:** Statistics

[133] **viXra:1607.0497 [pdf]**
*submitted on 2016-07-26 16:13:09*

**Authors:** Glenn Healey

**Comments:** 7 Pages.

Given information about batted balls for a set of players, we review techniques for estimating the reliability of a statistic as a function of the sample size. We also review methods for using the estimated reliability to compute the variance of true talent and to generate forecasts.

**Category:** Statistics

[132] **viXra:1607.0471 [pdf]**
*submitted on 2016-07-25 06:41:23*

**Authors:** Baokun Li, Gang Xiang, Vladik Kreinovich, Panagios Moscopoulos

**Comments:** 12 Pages.

One of the main objectives of statistics is to estimate the parameters of a probability distribution based on a sample taken from this distribution.

**Category:** Statistics

[131] **viXra:1607.0393 [pdf]**
*submitted on 2016-07-21 14:54:41*

**Authors:** Marisol García-Peña, Sergio Arciniegas-Alarcón, Kaye Basford, Carlos Tadeu dos Santos Dias

**Comments:** 13 Pages.

In multi-environment trials it is common to measure several response variables or attributes to determine the genotypes with the best characteristics. Thus it is important to have techniques to analyse multivariate multi-environment trial data. The main objective is to complement the literature on two multivariate techniques, the mixture maximum likelihood method of clustering and three-mode principal component analysis, used to analyse genotypes, environments and attributes simultaneously. In this way, both global and detailed statements about the performance of the genotypes can be made, highlighting the benefit of using three-way data in a direct way and providing an alternative analysis for researchers. We illustrate using sunflower data with twenty genotypes, eight environments and three attributes. The procedures provide an analytical procedure which is relatively easy to apply and interpret in order to describe the patterns of performance and associations in multivariate multi-environment trials.

**Category:** Statistics

[130] **viXra:1607.0244 [pdf]**
*submitted on 2016-07-18 06:02:32*

**Authors:** Florentin Smarandache

**Comments:** 3 Pages.

As in nature nothing is absolute, evidently there will not exist a precise border between the scientific language and “the literary” one (the language used in literature): thus there will be zones where these two languages intersect.

**Category:** Statistics

[129] **viXra:1605.0241 [pdf]**
*submitted on 2016-05-23 09:32:06*

**Authors:** Jianwen Huang

**Comments:** 11 Pages.

In this article, the high-order asymptotic
expansions of cumulative distribution function and probability
density function of extremes for generalized Maxwell distribution
are established under nonlinear normalization. As corollaries, the
convergence rates of the distribution and density of maximum are
obtained under nonlinear normalization.

**Category:** Statistics

[128] **viXra:1604.0302 [pdf]**
*submitted on 2016-04-22 01:25:58*

**Authors:** Bradly Alicea

**Comments:** 13 pages, 7 Figures, 2 Supplemental Figures. Full dataset can be found at doi:10.6084/m9.figshare.944542

What makes a good prediction good? Generally, the answer is thought to be a faithful accounting of both tangible and intangible factors. Among sports teams, it is thought that if you get enough of the tangible factors (e.g. roster, prior performance, schedule) correct, then the predictions will be correspondingly accurate. While there is a role for intangible factors, they are thought to gum up the works, so to speak. Here, I start with the hypothesis that the best and worst teams in a league or tournament are easy to predict relative to teams with average performance. Data from the 2013 MLB and NFL seasons plus data from the 2014 NCAA Tournament were used. Using a model-free approach, data representing various aspects of competition reveal that mainly the teams predicted to perform the worst actually conform to expectation. The reasons for this are then discussed, including the role of shot noise on performance driven by tangible factors.

**Category:** Statistics

[127] **viXra:1604.0009 [pdf]**
*submitted on 2016-04-01 12:11:19*

**Authors:** Ioannis Koukoutsidis

**Comments:** 28 pages, 8 figures

Mobile crowdsensing can facilitate environmental surveys by leveraging sensor equipped mobile devices that carry out measurements covering a wide area in a short time, without bearing the costs of traditional field work. In this paper, we
examine statistical methods to perform an accurate estimate of the mean value of an environmental parameter in a region, based on such measurements. The main focus is on estimates produced by taking a "snapshot" of the mobile device readings at a random instant in time. We compare stratified sampling with different stratification weights to sampling without stratification, as well as an appropriately modified version of systematic sampling. Our main result is that stratification with weights proportional to stratum areas can produce significantly smaller bias, and gets arbitrarily close to the true area average as the number of mobiles and the number of strata increase. The performance of the methods is evaluated for an application scenario where we estimate the mean area temperature in a linear region that exhibits the so-called *Urban Heat Island* effect, with mobile users moving in the region according to the Random Waypoint Model.

**Category:** Statistics

[126] **viXra:1603.0252 [pdf]**
*submitted on 2016-03-17 17:00:15*

**Authors:** Glenn Healey

**Comments:** 4 Pages.

This file contains an intrinsic contact list for batters.

**Category:** Statistics

[125] **viXra:1603.0251 [pdf]**
*submitted on 2016-03-17 17:02:40*

**Authors:** Glenn Healey

**Comments:** 3 Pages.

This file contains an intrinsic contact list for pitchers.

**Category:** Statistics

[124] **viXra:1603.0215 [pdf]**
*submitted on 2016-03-14 21:01:06*

**Authors:** Glenn Healey

**Comments:** 7 Pages.

Given a set of observed batted balls and their outcomes, we develop a method for learning
the dependence of a batted ball’s intrinsic value on its measured parameters.

**Category:** Statistics

[123] **viXra:1603.0180 [pdf]**
*submitted on 2016-03-11 17:50:17*

**Authors:** L. Martino, J. Plata, F. Louzada

**Comments:** 5 Pages.

In this work, we design an efficient Monte Carlo
scheme for diffusion estimation, where global and local parameters are involved in a unique inference problem. This
scenario often appears in distributed inference problems in
wireless sensor networks. The proposed scheme uses parallel local MCMC chains and then an importance sampling (IS) fusion for obtaining an efficient estimation of the global parameters. The resulting algorithm is simple and flexible. It can be easily applied iteratively, or extended in a sequential framework. In order to apply the novel scheme, the only assumption required about the model is that the measurements are conditionally independent given the related parameters.

**Category:** Statistics

[122] **viXra:1602.0333 [pdf]**
*submitted on 2016-02-25 18:17:42*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 5 Pages.

The Sequential Importance Resampling (SIR) method is the core of the Sequential Monte Carlo (SMC) algorithms (a.k.a., particle filters). In this work, we point out a suitable choice for weighting properly a resampled particle. This observation entails several theoretical and practical consequences, allowing also the design of novel sampling schemes. Specifically, we describe one theoretical result about the sequential estimation of the marginal likelihood. Moreover, we suggest a novel resampling procedure for SMC algorithms called partial resampling, involving only a subset of the current cloud of particles. Clearly, this scheme attenuates the additional variance in the Monte Carlo estimators generated by the use of the resampling.

**Category:** Statistics

[121] **viXra:1602.0112 [pdf]**
*submitted on 2016-02-09 14:48:10*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 31 Pages.

The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In IS context, an approximation of the theoretical ESS definition is widely applied, $\widehat{ESS}$, involving the sum of the squares of the normalized importance weights. This formula $\widehat{ESS}$ has become an essential piece within Sequential Monte Carlo (SMC) methods using adaptive resampling procedures. The expression $\widehat{ESS}$ is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these pmfs. Several examples are provided involving, for instance, the geometric and harmonic means of the weights, the discrete entropy (including the perplexity measure, already proposed in literature) and the Gini coefficient. We list five requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.

**Category:** Statistics

[120] **viXra:1602.0053 [pdf]**
*submitted on 2016-02-05 03:06:47*

**Authors:** Jason Lind

**Comments:** 2 Pages. Very early stages

Defines a rated set and uses it to calculated a weight directly from the statistics that enabled broad unified interpretation of data.

**Category:** Statistics

[119] **viXra:1601.0179 [pdf]**
*submitted on 2016-01-16 22:40:19*

**Authors:** D. Luengo, L. Martino, V. Elvira, M. Bugallo

**Comments:** 22 Pages.

Many signal processing applications require performing statistical inference on large datasets, where computational and/or memory restrictions become an issue. In this big data setting, computing an exact global centralized estimator is often unfeasible. Furthermore, even when approximate numerical solutions (e.g., based on Monte Carlo methods) working directly on the whole dataset can be computed, they may not provide a satisfactory performance either. Hence, several authors have recently started considering distributed inference approaches, where the data is divided among multiple workers (cores, machines or a combination of both). The computations are then performed in parallel and the resulting distributed or partial estimators are finally combined to approximate the intractable global estimator. In this paper, we focus on the scenario where no communication exists among the workers, deriving efficient linear fusion rules for the combination of the distributed estimators. Both a Bayesian perspective (based on the Bernstein-von Mises theorem and the asymptotic normality of the estimators) and a constrained optimization view are provided for the derivation of the linear fusion rules proposed. We concentrate on minimum mean squared error (MMSE) partial estimators, but the approach is more general and can be used to combine any kind of distributed estimators as long as they are unbiased. Numerical results show the good performance of the algorithms developed, both in simple problems where analytical expressions can be obtained for the distributed MMSE estimators, and in a wireless sensor network localization problem where Monte Carlo methods are used to approximate the partial estimators.

**Category:** Statistics

[118] **viXra:1601.0174 [pdf]**
*submitted on 2016-01-16 07:32:42*

**Authors:** V. Elvira, L. Martino, D. Luengo, M. F. Bugallo

**Comments:** 34 Pages.

Population Monte Carlo (PMC) sampling methods are powerful tools for approximating distributions of static unknowns given a set of observations. These methods are iterative in nature: at each step they generate samples from a proposal distribution and assign them weights according to the importance sampling principle. Critical issues in applying PMC methods are the choice of the generating functions for the samples and the avoidance of the sample degeneracy. In this paper, we propose three new schemes that considerably improve the performance of the original PMC formulation by allowing for better exploration of the space of unknowns and by selecting more adequately the surviving samples.
A theoretical analysis is performed, proving the superiority of the novel schemes in terms of variance of the associated estimators and preservation of the sample diversity.
Furthermore, we show that they outperform other state of the art algorithms (both in terms of mean square error and robustness w.r.t. initialization) through extensive numerical simulations.

**Category:** Statistics

[117] **viXra:1601.0167 [pdf]**
*submitted on 2016-01-16 03:40:15*

**Authors:** Ilija Barukčić

**Comments:** Pages.

Titans like Bertrand Russell or Karl Pearson warned us to keep our mathematical and statistical hands off causality and at the end David Hume too. Hume's scepticism has dominated discussion of causality in both analytic philosophy and statistical analysis for a long time. But more and more researchers are working hard on this field and trying to get rid of this positions. In so far, much of the recent philosophical or mathematical writing on causation (Ellery Eells (1991), Daniel Hausman (1998), Pearl (2000), Peter Spirtes, Clark Glymour and Richard Scheines (2000), ...) either addresses to Bayes networks, to the counterfactual approach to causality developed in detail by David Lewis, to Reichenbach's Principle of the Common Cause or to the Causal Markov Condition. None of this approaches to causation investigated the relationship between causation and the law of independence to a necessary extent. Nonetheless, the relationship between causation and the law of independence, one of the fundamental concepts in probability theory, is very important. May an effect occur in the absence of a cause? May an effect fail to occur in the presence of a cause? In so far, what does constitute the causal relation? On the other hand, if it is unclear what does constitute the causal relation, maybe we can answer the question, what does not constitute the causal relation. So far, a cause as such can not be independent from its effect and vice versa, if there is a deterministic causal relationship. This publication will prove, that the law of independence defines causation to some extent ex negativo.

**Category:** Statistics

[116] **viXra:1601.0070 [pdf]**
*submitted on 2016-01-07 16:41:10*

**Authors:** J.Tiago de Oliveira

**Comments:** 37 Pages.

Statistical Analysis of Extremes
chapter 3

**Category:** Statistics

[115] **viXra:1601.0069 [pdf]**
*submitted on 2016-01-07 16:42:58*

**Authors:** J.Tiago de Oliveira

**Comments:** 11 Pages.

Statistical Analysis of Extremes
chapter 4

**Category:** Statistics

[114] **viXra:1601.0032 [pdf]**
*submitted on 2016-01-05 10:37:48*

**Authors:** M. Srinivas, S. Sambasiva Rao

**Comments:** 7 Pages. This paper has been published in Indian Journal of Physical Education and Allied Sciences, ISSN: 2395-6895, Vol.1, No.5, pp.37-44.

The statistical analysis of angular data is typically encountered in biological and geological studies, among several other areas of research. Circular data is the simplest case of this category of data called directional data, where the single response is not scalar, but angular or directional. A statistical analysis pertaining to two dimensional directional data is generally referred to as “Circular Statistics”. In this paper, an attempt is made to review various fundamental concepts of circular statistics and to discuss its applicability in sports science.

**Category:** Statistics

[113] **viXra:1512.0448 [pdf]**
*submitted on 2015-12-26 16:50:32*

**Authors:** J.Tiago de Oliveira

**Comments:** 36 Pages.

Second chapter
Statistical Analysis of Extremes
Pendor, Lisbon, 1997

**Category:** Statistics

[112] **viXra:1512.0436 [pdf]**
*submitted on 2015-12-26 12:04:44*

**Authors:** J.Tiago de Oliveira

**Comments:** 9 Pages. First chapter

J. Tiago de Oliveira last book followed the research started by Emil Julius Gumbel

**Category:** Statistics

[111] **viXra:1512.0420 [pdf]**
*submitted on 2015-12-25 09:53:50*

**Authors:** L. Martino, J. Read, V. Elvira, F. Louzada

**Comments:** 21 Pages.

We design a sequential Monte Carlo scheme for the joint purpose of Bayesian inference and model selection, with application to urban mobility context where different modalities of movement can be employed. In this case, we have the joint problem of online tracking and detection of the current modality.
For this purpose, we use interacting parallel particle filters each one addressing a different model. They cooperate for providing a global estimator of the variable of interest and, at the same time, an approximation of the posterior density of the models given the data. The interaction occurs by a parsimonious distribution of the computational effort, adapting on-line the number of particles of each filter according to the posterior probability of the corresponding model. The resulting scheme is simple and provides good results in different numerical experiments with artificial and real data.

**Category:** Statistics

[110] **viXra:1512.0319 [pdf]**
*submitted on 2015-12-14 09:37:41*

**Authors:** H. Jabbari1, M. Erfaniyan

**Comments:** 10 Pages.

Let fXn; n 1g be a strictly stationary sequence of negatively associated random
variables, with common continuous and bounded distribution function F. We consider
the estimation of the two-dimensional distribution function of (X1;Xk+1) based on kernel
type estimators as well as the estimation of the covariance function of the limit empirical
process induced by the sequence fXn; n 1g where k 2 IN0. Then, we derive uniform
strong convergence rates for the kernel estimator of two-dimensional distribution function
of (X1;Xk+1) which were not found already and do not need any conditions on the covari-
ance structure of the variables. Furthermore assuming a convenient decrease rate of the
covariances Cov(X1;Xn+1); n 1, we prove uniform strong convergence rate for covari-
ance function of the limit empirical process based on kernel type estimators. Finally, we
use a simulation study to compare the estimators of distribution function of (X1;Xk+1).

**Category:** Statistics

[109] **viXra:1512.0294 [pdf]**
*submitted on 2015-12-12 02:35:48*

**Authors:** Amelia Carolina Sparavigna

**Comments:** 4 Pages. Published in International Journal of Sciences, 2015, 4(10):1-4. DOI:10.18483/ijSci.845

Mutual information of two random variables can be easily obtained from their Shannon entropies. However, when nonadditive entropies are involved, the calculus of the mutual information is more complex. Here we discuss the basic matter about information from Shannon entropy. Then we analyse the case of the generalized nonadditive Tsallis entropy

**Category:** Statistics

[108] **viXra:1512.0293 [pdf]**
*submitted on 2015-12-12 02:40:18*

**Authors:** Amelia Carolina Sparavigna

**Comments:** 4 Pages. Published in International Journal of Sciences, 2015, 4(10):47-50. DOI:10.18483/ijSci.866

Tsallis and Kaniadakis entropies are generalizing the Shannon entropy and have it as their limit when their entropic indices approach specific values. Here we show some relations existing between Tsallis and Kaniadakis entropies. We will also propose a rigorous discussion of the conditional Kaniadakis entropy, deduced from these relations.

**Category:** Statistics

[107] **viXra:1511.0233 [pdf]**
*submitted on 2015-11-24 04:47:27*

**Authors:** M. F. Bugallo, L. Martino, J. Corander

**Comments:** Digital Signal Processing, Volume 47, Pages 36–49, 2015.

In Bayesian signal processing, all the information about the unknowns of interest is contained in their posterior distributions.
The unknowns can be parameters of a model, or a model and its parameters. In many important problems, these distributions
are impossible to obtain in analytical form. An alternative is to generate their approximations by Monte Carlo-based methods
like Markov chain Monte Carlo (MCMC) sampling, adaptive importance sampling (AIS) or particle filtering (PF). While MCMC
sampling and PF have received considerable attention in the literature and are reasonably well understood, the AIS methodology remains relatively unexplored. This article reviews the basics of AIS as well as provides a comprehensive survey of the state-of the-art of the topic. Some of its most relevant implementations are revisited and compared through computer simulation examples.

**Category:** Statistics

[106] **viXra:1511.0232 [pdf]**
*submitted on 2015-11-24 05:31:30*

**Authors:** V. Elvira, L. Martino, D. Luengo, M. F. Bugallo

**Comments:** 38 Pages.

Importance Sampling methods are broadly used to approximate posterior distributions or some of their moments. In its
standard approach, samples are drawn from a single proposal distribution and weighted properly. However, since the performance depends on the mismatch between the targeted and the proposal distributions, several proposal densities are often employed for the generation of samples. Under this Multiple Importance Sampling (MIS) scenario, many works have addressed the selection or adaptation of the proposal distributions, interpreting the sampling and the weighting steps in different ways. In this paper, we establish a general framework for sampling and weighting procedures when more than one proposal is available. The most relevant MIS schemes in the literature are encompassed within the new framework, and, moreover novel valid schemes appear naturally. All the MIS schemes are compared and ranked in terms of the variance of the associated estimators. Finally, we provide illustrative examples which reveal that, even with a good choice of the proposal densities, a careful interpretation of the sampling and weighting procedures can make a significant difference in the performance of the method.

**Category:** Statistics

[105] **viXra:1511.0003 [pdf]**
*submitted on 2015-11-01 06:07:39*

**Authors:** John R. Dixon

**Comments:** 41 Pages.

This is the technical report to accompany:
Dixon, John R., Michael R. Kosorok, and Bee Leng Lee. "Functional inference in semiparametric models using the piggyback bootstrap." Annals of the Institute of Statistical Mathematics 57, no. 2 (2005): 255-277.

**Category:** Statistics

[104] **viXra:1509.0048 [pdf]**
*submitted on 2015-09-04 05:40:14*

**Authors:** L. Martino, F. Louzada

**Comments:** 13 Pages.

The adaptive rejection sampling (ARS) algorithm is a universal random generator for drawing samples efficiently from a univariate log-concave target probability density function (pdf). ARS generates independent samples from the target via rejection sampling with high acceptance rates. Indeed, ARS yields a sequence of proposal functions that converge toward the target pdf, so that the probability of accepting a sample approaches one. However, sampling from the proposal pdf becomes more computational demanding each time it is updated. In this work, we propose a novel ARS scheme, called Cheap Adaptive Rejection Sampling (CARS), where the computational effort for drawing from the proposal remains constant, decided in advance by the user. For generating a large number of desired samples, CARS is faster than ARS.

**Category:** Statistics

[103] **viXra:1508.0265 [pdf]**
*submitted on 2015-08-27 02:35:07*

**Authors:** B. B. Khare, Habib Ur Rehman, U. Srivastava

**Comments:** 10 Pages.

In this paper, a study of improved chain ratio-cum regression type estimator for population
mean in the presence of non-response for fixed cost and specified precision has been made.
Theoretical results are supported by carrying out one numerical illustration.

**Category:** Statistics

[102] **viXra:1508.0256 [pdf]**
*submitted on 2015-08-27 02:50:36*

**Authors:** B. B. Khare

**Comments:** 8 Pages.

The auxiliary information is used in increasing the efficiency of the estimators for the
parameters of the populations such as mean, ratio, and product of two population means. In this context, the estimation procedure for the ratio and product of two population means using auxiliary characters in special reference to the non response problem has been discussed.

**Category:** Statistics

[101] **viXra:1508.0142 [pdf]**
*submitted on 2015-08-18 02:29:47*

**Authors:** L. Martino, F. Louzada

**Comments:** 17 Pages.

The multiple Try Metropolis (MTM) algorithm
is an advanced MCMC technique based on drawing and testing several candidates at each iteration of the algorithm. One of them is selected according to certain weights and then it is tested according to a suitable acceptance probability. Clearly, since the computational cost increases as the employed number of tries grows, one expects that the performance of an MTM scheme improves as the number of tries increases, as well. However, there are scenarios where the increase of number of tries does not produce a corresponding enhancement of the performance. In this work, we describe these scenarios and then we introduce possible solutions for solving these issues.

**Category:** Statistics

[100] **viXra:1507.0125 [pdf]**
*submitted on 2015-07-16 09:20:20*

**Authors:** editors Rajesh Singh, Florentin Smarandache

**Comments:** 54 Pages.

The present book aims to present some improved estimators using auxiliary and attribute information in case of simple random sampling and stratified random sampling and in some cases when non-response is present.
This volume is a collection of five papers, written by seven co-authors (listed in the order of the papers): Sachin Malik, Rajesh Singh, Florentin Smarandache, B. B. Khare, P. S. Jha, Usha Srivastava and Habib Ur. Rehman.
The first and the second papers deal with the problem of estimating the finite population mean when some information on two auxiliary attributes are available. In the third paper, problems related to estimation of ratio and product of two population mean using auxiliary characters with special reference to non-response are discussed.
In the fourth paper, the use of coefficient of variation and shape parameters in each stratum, the problem of estimation of population mean has been considered. In the fifth paper, a study of improved chain ratio-cum-regression type estimator for population mean in the presence of non-response for fixed cost and specified precision has been made.
The authors hope that the book will be helpful for the researchers and students that are working in the field of sampling techniques.

**Category:** Statistics

[99] **viXra:1507.0110 [pdf]**
*submitted on 2015-07-14 15:18:08*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander, F. Louzada

**Comments:** 20 Pages.

Monte Carlo (MC) methods are widely used in signal processing, machine learning and stochastic optimization. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called orthogonal MCMC (O-MCMC), where a set of ``vertical'' parallel MCMC chains share information using some ``horizontal" MCMC techniques working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations of O-MCMC, novel schemes for reducing the overall computational cost of parallel multiple try Metropolis (MTM) chains are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel simulated annealing (SA) algorithms. We also discuss the application of O-MCMC in a big bata framework.
Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the estimation, as well as robustness in terms of independence with respect to initial values and parameter choice.

**Category:** Statistics

[98] **viXra:1507.0029 [pdf]**
*submitted on 2015-07-05 07:21:38*

**Authors:** Khaled Ouafi

**Comments:** 9 Pages.

We investigate the issue of approximate Bayesian parameter inference in nonlinear state space models with complex likelihoods. Sequential Monte Carlo with approximate Bayesian computations (SMC-ABC) is an approach to approximate the likelihood in this type of models. However, such approximations can be noisy and computationally expensive which hinders cost-effective implementations using standard methods based on optimisation and statistical simulation. We propose a innovational method based on the combination of Gaussian process optimisation (GPO) and SMC-ABC to create a Laplace approximation of the intractable posterior. The properties of the resulting GPO-ABC method are studied using stochastic volatility (SV) models with both synthetic and real-world data. We conclude that the algorithm enjoys: good accuracy comparable to particle Markov chain Monte Carlo with a significant reduction in computational cost and better robustness to noise in the estimates compared with a gradient-based optimisation algorithm. Finally, we make use of GPO-ABC to estimate the Value-at-Risk for a portfolio using a copula model with SV models for the margins.

**Category:** Statistics

[97] **viXra:1506.0175 [pdf]**
*submitted on 2015-06-24 13:01:14*

**Authors:** Ilija Barukčić

**Comments:** 19 pages. (C) Ilija Barukčić, Jever, Germany, 2015,

The deterministic relationship between cause and effect is deeply connected with our understanding of the physical sciences and their explanatory ambitions. Though progress is being made, the lack of theoretical predictions and experiments in quantum gravity makes it difficult to use empirical evidence to justify a theory of causality at quantum level in normal circumstances, i. e. by predicting the value of a well-confirmed experimental result. For a variety of reasons, the problem of the deterministic relationship between cause and effect is related to basic problems of physics as such. Despite the common belief, it is a remarkable fact that a theory of causality should be consistent with a theory of everything and is because of this linked to problems of a theory of everything. Thus far, solving the problem of causality can help to solve the problems of the theory of everything (at quantum level) too.

**Category:** Statistics

[96] **viXra:1506.0067 [pdf]**
*submitted on 2015-06-08 14:58:47*

**Authors:** Christopher Goddard

**Comments:** 4 Pages.

It is a common problem in statistics to determine the appropriate heuristic to select from a set of hypotheses (or equivalently, models), prior to optimising that model to fit the data. In this short note I sketch a technique based on the construction of an information in order to compute the optimal model within a given model space and given data.

**Category:** Statistics

[95] **viXra:1505.0136 [pdf]**
*submitted on 2015-05-19 00:31:36*

**Authors:** Vorobyev O.Yu., Golovkov L.S.

**Comments:** 10 Pages.

This article brings in two new discrete distributions: multidimensional Binomial
distribution and multidimensional Poisson distribution. Also there are its characteristics and properties.

**Category:** Statistics

[94] **viXra:1505.0135 [pdf]**
*submitted on 2015-05-18 10:45:07*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander

**Comments:** 26 Pages.

Monte Carlo algorithms represent the \textit{de facto} standard for approximating complicated integrals involving multidimensional target distributions. In order to generate random realizations from the target distribution, Monte Carlo techniques use simpler proposal probability densities for drawing candidate samples. Performance of any such method is strictly related to the specification of the proposal distribution, such that unfortunate choices easily wreak havoc on the resulting estimators. In this work, we introduce a \textit{layered}, that is a hierarchical, procedure for generating samples employed within a Monte Carlo scheme. This approach ensures that an appropriate equivalent proposal distribution is always obtained automatically (thus eliminating the risk of a catastrophic performance), although at the expense of a moderate increase in the complexity of the resulting algorithm. A hierarchical interpretation of two well-known methods, such as of the random walk Metropolis-Hastings (MH) and the Population Monte Carlo (PMC) techniques, is provided.
Furthermore, we provide a general unified importance sampling (IS) framework where multiple proposal densities are employed, and several IS schemes are introduced applying the so-called deterministic mixture approach.
Finally, given these schemes, we also propose a novel class of adaptive importance samplers using a population of proposals, where the adaptation is driven by independent parallel or interacting Markov Chain Monte Carlo (MCMC) chains. The resulting algorithms combine efficiently the benefits of both IS and MCMC methods.

**Category:** Statistics

[93] **viXra:1503.0088 [pdf]**
*submitted on 2015-03-12 09:09:50*

**Authors:** Jianwen Huang, Shouquan Chen

**Comments:** 10 Pages.

We introduce logarithmic generalized Maxwell
distribution which is an extension of the generalized Maxwell
distribution. Some interesting properties of this distribution are
studied and the asymptotic distribution of the partial maximum of an
independent and identically distributed sequence from the
logarithmic generalized Maxwell distribution is gained.

**Category:** Statistics

[92] **viXra:1412.0276 [pdf]**
*submitted on 2014-12-31 01:34:35*

**Authors:** Jianwen Huang, Yanmin Liu

**Comments:** 7 Pages.

In this paper, with optimal normalized constants,
the asymptotic expansions of the distribution of the normalized
maxima from generalized Maxwell distribution is derived. It shows
that the convergence rate of the normalized maxima to the Gumbel
extreme value distribution is proportional to $1/\log n.$

**Category:** Statistics

[91] **viXra:1412.0275 [pdf]**
*submitted on 2014-12-31 01:42:41*

**Authors:** Jianwen Huang, Yanmin Liu

**Comments:** 12 Pages.

In this paper, the higher-order asymptotic
expansion of the moment of extreme from generalized Maxwell
distribution is gained, by which one establishes the rate of
convergence of the moment of the normalized partial
maximum to the moment of the associate Gumbel extreme value distribution.

**Category:** Statistics

[90] **viXra:1412.0247 [pdf]**
*submitted on 2014-12-26 15:30:26*

**Authors:** Sergio Arciniegas-Alarcón, Marisol García-Peña, Wojtek Krzanowski, Carlos Tadeu dos Santos Dias

**Comments:** 14 Pages.

A common problem in multi-environment trials arises when some genotype-by-environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation and allowing multiple as well as simple imputation. The three methods are assessed and compared in a simulation study, using a complete set of real data in which values are deleted randomly at different rates. The quality of the imputations is evaluated using three measures: the Procrustes statistic,the squared correlation between matrices and the normalised root mean squared error between these estimates and the true observed values. None of the methods makes any distributional or structural assumptions, and all of them can be used for any pattern or mechanism of the missing values.

**Category:** Statistics

[89] **viXra:1412.0003 [pdf]**
*submitted on 2014-12-01 04:45:04*

**Authors:** Marisol García-Peña, Sergio Arciniegas-Alarcón, Décio Barbin

**Comments:** 10 Pages.

A common problem in climate data is missing information. Recently, four methods have been developed which are based in the singular value decomposition of a matrix (SVD). The aim of this paper is to evaluate these new developments making a comparison by means of a simulation study based on two complete matrices of real data. One corresponds to the historical precipitation of Piracicaba / SP - Brazil and the other matrix corresponds to multivariate meteorological characteristics in the same city from year 1997 to 2012. In the study, values were deleted randomly at different percentages with subsequent imputation, comparing the methodologies by three criteria: the normalized root mean squared error, the similarity statistic of Procrustes and the Spearman correlation coefficient. It was concluded that the SVD should be used only when multivariate matrices are analyzed and when matrices of precipitation are used, the monthly mean overcome the performance of other methods based on the SVD.

**Category:** Statistics

[88] **viXra:1411.0396 [pdf]**
*submitted on 2014-11-20 03:16:54*

**Authors:** A. Borumand Saeid, A. Namdar

**Comments:** 7 Pages.

We introduce the notion of Smarandache BCH-algebra and Smarandache (fresh, clean and fantastic) ideals, some example are given and related properties are investigated. Relationship between
Q-Smarandache (fresh, clean and fantastic) ideals and other types of ideals are given. Extension properties for Q-Smarandache (fresh, clean and fantastic) ideals are established.

**Category:** Statistics

[87] **viXra:1411.0270 [pdf]**
*submitted on 2014-11-19 01:04:21*

**Authors:** Florentin Smarandache

**Comments:** 2 Pages.

In this note the author presents a new proof for the theorem of I. Patrascu.

**Category:** Statistics

[86] **viXra:1411.0267 [pdf]**
*submitted on 2014-11-19 01:14:33*

**Authors:** Florentin Smarandache

**Comments:** 1 Page.

It is possible to cover all (positive) integers with n geometrical progressions of integers?
Find a necessary and sufficient condition for a general class of positive integer sequences
such that, for a fixed n , there are n (distinct) sequences of this class which cover all integers.

**Category:** Statistics

[85] **viXra:1411.0265 [pdf]**
*submitted on 2014-11-19 01:17:32*

**Authors:** Marian Niţu, Florentin Smarandache, Mircea Eugen Şelariu

**Comments:** 22 Pages.

Ideea centrală a lucrarii este prezentarea unor transformări noi, anterior inexistente în Matematica ordinară, denumită centrică (MC), dar, care au devenit posibile graţie apariţiei matematicii excentrice şi, implicit, a supermatematicii.

**Category:** Statistics

[84] **viXra:1411.0264 [pdf]**
*submitted on 2014-11-19 01:18:41*

**Authors:** Mircea E.selariu, Florentin Smarandache, Marian Nitu

**Comments:** 18 Pages.

Lucrarea prezintă corespondentele din matematica excentrică ale funcţiilor cardinale şi integrale din matematica centrică, sau matematica ordinară, funcţii centrice prezentate şi în introducerea lucrării, deoarece sunt prea puţin cunoscute, deşi sunt utilizate pe larg în fizica ondulatorie

**Category:** Statistics

[83] **viXra:1411.0260 [pdf]**
*submitted on 2014-11-19 01:38:40*

**Authors:** Octavian Cira, Florentin Smarandache

**Comments:** 8 Pages.

The first prime number with the special property that its addition with its reversal gives as result a prime number too is 299. The prime numbers with this property will be called Luhn prime numbers. In this article we intend to present a performing
algorithm for determining the Luhn prime numbers.

**Category:** Statistics

[82] **viXra:1411.0258 [pdf]**
*submitted on 2014-11-19 01:40:47*

**Authors:** Said Broumi, Pinaki Majumdar, Florentin Smarandache

**Comments:** 11 Pages.

In this paper , we have defined First Zadeh’s implication , First Zadeh’s intuitionistic fuzzy conjunction and intuitionistic fuzzy disjunction of two intuitionistic fuzzy soft sets and some their basic properties are studied with proofs and examples.

**Category:** Statistics

[81] **viXra:1411.0255 [pdf]**
*submitted on 2014-11-19 02:04:12*

**Authors:** Ion Patrascu, Florentin Smarandache

**Comments:** 3 Pages.

Open problem
Construct, using a ruler and a compass, two non-congruent triangles, which have equal
perimeters and arias.
In preparation for the proof of this problem we recall several notions and we prove a
Lemma.

**Category:** Statistics

[80] **viXra:1411.0253 [pdf]**
*submitted on 2014-11-19 02:07:41*

**Authors:** C.Dumitrescu, N.Varlan, St Zanfir, N.Radescu, F.Smarandache

**Comments:** 23 Pages.

In this paper we extend the Smarandache function.

**Category:** Statistics

[79] **viXra:1411.0252 [pdf]**
*submitted on 2014-11-19 02:09:03*

**Authors:** Ion Patrascu

**Comments:** 6 Pages.

In this article, we review some properties of the harmonic quadrilateral related to triangle simedians and to Apollonius circles.

**Category:** Statistics

[78] **viXra:1411.0072 [pdf]**
*submitted on 2014-11-08 15:25:10*

**Authors:** Suhoparov Stanislav Yurievich

**Comments:** 5 Pages.

Derivation of the recurrence relation for orthogonal polynomials and usage.
Вывод рекуррентного соотношения ортогональных многочленов из процесса ортогонализации Грама-Шмидта, а также схема применения полученного рекуррентного соотношения

**Category:** Statistics

[77] **viXra:1411.0064 [pdf]**
*submitted on 2014-11-07 17:22:14*

**Authors:** Jean Claude Dutailly

**Comments:** 16 Pages.

The purpose of this paper is to present a general method to estimate the probability of transitions of a system between phases. The system must be represented in a quantitative model, with vectorial variables depending on time, satisfying general conditions which are usually met. The method can be implemented in Physics, Economics or Finances.

**Category:** Statistics

[76] **viXra:1411.0016 [pdf]**
*submitted on 2014-11-03 07:05:31*

**Authors:** Sergio Arciniegas-Alarcón, Marisol García-Peña, Wojtek Krzanowski, Carlos Tadeu dos Santos Dias

**Comments:** 17 Pages.

Missing values for some genotype-environment combinations are commonly encountered in multienvironment trials. The recommended methodology for analyzing such unbalanced data combines the Expectation-Maximization (EM) algorithm with the additive main effects and multiplicative interaction (AMMI) model. Recently, however, four imputation algorithms based on the Singular Value Decomposition of a matrix (SVD) have been reported in the literature (Biplot imputation, EM+SVD, GabrielEigen imputation, and distribution free multiple imputation - DFMI). These algorithms all fill in the missing values, thereby removing the lack of balance in the original data and permitting simpler standard analyses to be performed. The aim of this paper is to compare these four algorithms with the gold standard EM-AMMI. To do this, we report the results of a simulation study based on three complete sets of real data (eucalyptus, sugar cane and beans) for various imputation percentages. The methodologies were compared using the normalised root mean squared error, the Procrustes similarity statistic and the Spearman correlation coefficient. The conclusion is that imputation using the EM algorithm plus SVD provides competitive results to those obtained with the gold standard. It is also an excellent alternative to imputation with an additive model, which in practice ignores the genotype-by-environment interaction and therefore may not be appropriate in some cases.

**Category:** Statistics

[75] **viXra:1410.0191 [pdf]**
*submitted on 2014-10-29 07:37:19*

**Authors:** Carlos Tadeu dos Santos Dias, Kuang Hongyu, Lúcio B. Araújo, Maria Joseane C. Silva, Marisol García-Peña, Mirian F. C. Araújo, Priscila N. Faria, Sergio Arciniegas-Alarcón

**Comments:** 19 Pages. Paper in portuguese.

This work is based on the short course “A Metodologia AMMI: Com Aplicacão ao Melhoramento Genético” taught during the 58a RBRAS and 15o SEAGRO held in Campina Grande - PB and aim to introduce the AMMI method for those that have and no have the mathematical training. We do not intend to submit a detailed work, but the intention is to serve as a light for researchers, graduate and postgraduate students. In other words, is a work to stimulate research and the quest for knowledge in an area of statistical methods. For this propose we make a review about the genotype-by-environment interaction, definition of the AMMI models and some selection criteria and biplot graphic. More details about it can be found in the material produced for the short course.

**Category:** Statistics

[74] **viXra:1410.0121 [pdf]**
*submitted on 2014-10-21 11:16:26*

**Authors:** Sergio Arciniegas-Alarcón, Carlos Tadeu dos Santos Dias, Marisol García-Peña

**Comments:** 9 Pages. Paper in portuguese with abstract in english.

Abstract – The objective of this work was to propose a new distribution‑free multiple imputation algorithm, through modifications of the simple imputation method recently developed by Yan in order to circumvent the problem of unbalanced experiments. The method uses the singular value decomposition of a matrix and was tested using simulations based on two complete matrices of real data, obtained from eucalyptus and sugarcane trials, with values deleted randomly at different percentages. The quality of the imputations was evaluated by a measure of overall accuracy that combines the variance between imputations and their mean square deviations in relation to the deleted values. The best alternative for multiple imputation is a multiplicative model that includes weights near to 1 for the eigenvalues calculated with the decomposition. The proposed methodology does not depend on distributional or structural assumptions and does not have any restriction regarding the pattern or the mechanism of the missing data.

**Category:** Statistics

[73] **viXra:1410.0077 [pdf]**
*submitted on 2014-10-14 13:14:47*

**Authors:** T. Prabhakar Reddy, S. Sambasiva Rao, P. Ramu

**Comments:** 13 Pages. This paper has been published in Journal of Physical Education and Sports Science, pp.226-234,Vol 2, 2014. ISSN 2229-7049.

Unpredictable game of the limited-over cricket brings with it excitement for the audience, expecting mayhem on the field. The huge expectation of audience to watch a good match may be ruined with an interruption due to bad weather or circumstances. Therefore, it is very much necessary to adjust the target score at the time of resumption of an interrupted match in a reasonable manner. Several mathematical models for resetting the target in interrupted one-day international (ODI) cricket matches are available in the literature; none of them is optimal for Twenty20 (T20) format to apply. The purpose of this note is to review the existing Rain Rules to reset the targets in an interrupted ODI cricket matches and to propose a method for resetting the targets in an interrupted T20 cricket match with suitable illustrative examples.

**Category:** Statistics

[72] **viXra:1410.0070 [pdf]**
*submitted on 2014-10-13 10:09:31*

**Authors:** Huang Jianwen, Yang Hongyan

**Comments:** 6 Pages.

Let
$\{X_n,~n\geq1\}$ be independent and identically distributed random
variables with each $X_n$ following skew normal distribution. Let
$M_n=\max\{X_k,~1\leq k\leq n\}$ denote the partial maximum of
$\{X_n,~n\geq1\}$. Liao et al. (2014) considered the convergence
rate of the distribution of the maxima for random variables obeying
the skew normal distribution under linear normalization. In this
paper, we
obtain the asymptotic distribution of the maximum under power
normalization and normalizing constants as well as the associated pointwise convergence rate under power
normalization.

**Category:** Statistics

[71] **viXra:1409.0127 [pdf]**
*submitted on 2014-09-16 10:08:05*

**Authors:** Jianwen Huang, Shouquan Chen

**Comments:** 15 Pages.

Let $\{X_n,~n\geq1\}$ be an independent
and identically distributed random sequence with common
distribution $F$ obeying the lognormal distribution. In
this paper, we obtain the exact uniform convergence rate of the
distribution of the maximum to its extreme value limit under power normalization.

**Category:** Statistics

[70] **viXra:1409.0119 [pdf]**
*submitted on 2014-09-15 10:24:34*

**Authors:** Jianwen Huang, Shouquan Chen

**Comments:** 9 Pages.

Motivated by Finner et al. (2008), the
asymptotic behavior of the probability density function (pdf) and
the cumulative distribution function (cdf) of the generalized
exponential and Maxwell distributions are studied. Specially, we
consider the asymptotic behavior of the ratio of the pdfs (cdfs) of
the generalized exponential and Student's $t$-distributions (likewise
for the Maxwell and Student's $t$-distributions) as the degrees of
freedom parameter approach infinity in an appropriate way. As by
products, Mills' ratios for the generalized exponential and Maxwell
distributions are gained. Moreover, we illustrate some examples to
indicate the application of our results in extreme value theory.

**Category:** Statistics

[69] **viXra:1409.0051 [pdf]**
*submitted on 2014-09-08 03:03:33*

**Authors:** L. Martino, J. Corander

**Comments:** 10 Pages.

Markov Chain Monte Carlo (MCMC) methods are well-known Monte Carlo methodologies, widely used in different fields for statistical inference and stochastic optimization. The Multiple Try Metropolis (MTM) algorithm is an extension of the standard Metropolis-Hastings (MH) algorithm in which the next state of the chain is chosen among a set of candidates, according to certain weights.
The Particle MH (PMH) algorithm is other advanced MCMC technique specifically designed for scenarios where the multidimensional target density can be easily factorized as multiplication of (lower - dimensional) conditional densities. Both are widely studied and applied in literature. In this note, we investigate similarities and differences among the MTM schemes and the PMH method.

**Category:** Statistics

[68] **viXra:1409.0015 [pdf]**
*submitted on 2014-09-02 11:32:22*

**Authors:** Ellida M. Khazen

**Comments:** 25 Pages.

The problem of filtering of unobservable components x(t) of a multidimensional continuous diffusion Markov process z(t)=(x(t),y(t)), given the observations of the (multidimensional) process y(t) taken at discrete consecutive times with small time steps, is analytically investigated. On the base of that investigation the new algorithms for simulation of unobservable components, x(t), and the new algorithms of nonlinear filtering with the use of sequential Monte Carlo methods, or particle filters, are developed and suggested. The analytical investigation of observed quadratic variations is also developed. The new closed form analytical formulae are obtained, which characterize dispersions of deviations of the observed quadratic variations and the accuracy of some estimates for x(t). As an illustrative example, estimation of volatility (for the problems of financial mathematics) is considered. The obtained new algorithms extend the range of applications of sequential Monte Carlo methods, or particle filters, beyond the hidden Markov models and improve their performance.

**Category:** Statistics

[67] **viXra:1405.0280 [pdf]**
*submitted on 2014-05-21 11:13:00*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander

**Comments:** 18 Pages.

Monte Carlo (MC) methods are well-known techniques in different fields as signal processing, communications and machine learning. A well-known class of MC methods is composed of importance sampling (IS) and its adaptive extensions, e.g., Adaptive Multiple IS (AMIS) and Population Monte Carlo (PMC). In this work, we introduce an adaptive and iterated importance sampler using a population of proposal densities. The novel algorithm, called Adaptive Population Importance Sampling (APIS), provides a global estimation of the variables of interest iteratively, using all the samples generated. APIS mixes together different convenient features of the AMIS and PMC schemes. Furthermore, APIS uses simultaneously both simple and more sophisticated approaches (as the deterministic mixture) to build the IS estimators. The cloud of proposals is adapted by learning from a subset of previously generated samples, in such a way that local features of the target density can be better taken into account compared to single global adaptation procedures. Numerical results show the advantages of the proposed sampling scheme in terms of mean square error. The resulting algorithm is also more robust in terms of sensibility to the initial choice of the parameters, w.r.t. other techniques as AMIS and PMC.

**Category:** Statistics

[66] **viXra:1405.0263 [pdf]**
*submitted on 2014-05-18 08:40:10*

**Authors:** L. Martino, H. Yang, D. Luengo, J. Kanniainen, J. Corander

**Comments:** 19 Pages.

Gibbs sampling is a well-known Markov Chain Monte Carlo (MCMC) technique, widely applied to draw samples from multivariate target distributions which appear often in many different fields (machine learning, finance, signal processing, etc.). The application of the Gibbs sampler requires being able to draw efficiently from the univariate full-conditional distributions. In this work, we present a simple, self-tuned and extremely efficient MCMC algorithm that produces virtually independent samples from the target. The proposal density used is self-tuned to the specific target but it is not adaptive. Instead, the proposal is adjusted during the initialization stage following a simple procedure.
As a consequence, there is no ``fuss'' about convergence or tuning, and the execution of the algorithm is remarkably speed up. Although it can be used as a stand-alone algorithm to sample from a generic univariate distribution, the proposed approach is particularly suited for its use within a Gibbs sampler, especially when sampling from spiky multi-modal distributions. Hence, we call it FUSS (Fast Universal Self-tuned Sampler). Numerical experiments on several synthetic and real data sets show its good performance in terms of speed and estimation accuracy.

**Category:** Statistics

[92] **viXra:1802.0192 [pdf]**
*replaced on 2018-02-16 14:52:45*

**Authors:** Stephen P. Smith

**Comments:** 9 Pages.

Yu (2000) described the use of the Gibbs sampler to estimate regression parameters where the information available in the form of depended variables is limited to rank information, and where the linear model applies to the underlying variation beneath the ranks. The approach uses an imputation step, which constitute nested draws from truncated normal distributions where the underlying variation is simulated as part of a broader Bayesian simulation. The method is general enough to treat rank information that represents ties or partial orderings.

**Category:** Statistics

[91] **viXra:1712.0244 [pdf]**
*replaced on 2018-01-13 13:47:16*

**Authors:** Luca Martino

**Comments:** 46 Pages. (to appear) Digital Signal Processing, 2018.

Many applications in signal processing require the estimation of some parameters of interest given a set of observed data. More specifically, Bayesian inference needs the computation of a-posteriori estimators which are often expressed as complicated multi-dimensional integrals. Unfortunately, analytical expressions for these estimators cannot be found in most real-world applications, and Monte Carlo methods are the only feasible approach. A very powerful class of Monte Carlo techniques is formed by the Markov Chain Monte Carlo (MCMC) algorithms. They generate a Markov chain such that its stationary distribution coincides with the target posterior density. In this work, we perform a thorough review of MCMC methods using multiple candidates in order to select the next state of the chain, at each iteration. With respect to the classical Metropolis-Hastings method, the use of multiple try techniques foster the exploration of the sample space. We present different Multiple Try Metropolis schemes, Ensemble MCMC methods, Particle Metropolis-Hastings algorithms and the Delayed Rejection Metropolis technique. We highlight limitations, benefits, connections and dierences among the different methods, and compare them by numerical simulations.

**Category:** Statistics

[90] **viXra:1712.0244 [pdf]**
*replaced on 2018-01-13 08:15:36*

**Authors:** Luca Martino

**Comments:** 46 Pages. (to appear) Digital Signal Processing, 2018.

Many applications in signal processing require the estimation of some parameters of interest given a set of observed data. More specifically, Bayesian inference needs the computation of a-posteriori estimators which are often expressed as complicated multi-dimensional integrals. Unfortunately, analytical expressions for these estimators cannot be found in most real-world applications, and Monte Carlo methods are the only feasible approach. A very powerful class of Monte Carlo techniques is formed by the Markov Chain Monte Carlo (MCMC) algorithms. They generate a Markov chain such that its stationary distribution coincides with the target posterior density. In this work, we perform a thorough review of MCMC methods using multiple candidates in order to select the next state of the chain, at each iteration. With respect to the classical Metropolis-Hastings method, the use of multiple try techniques foster the exploration of the sample space. We present different Multiple Try Metropolis schemes, Ensemble MCMC methods, Particle Metropolis-Hastings algorithms and the Delayed Rejection Metropolis technique. We highlight limitations, benefits, connections and dierences among the different methods, and compare them by numerical simulations.

**Category:** Statistics

[89] **viXra:1710.0311 [pdf]**
*replaced on 2017-10-31 04:20:07*

**Authors:** Ilija Barukčić

**Comments:** 13 pages. Copyright © 2017 by Ilija Barukčić, Jever, Germany. All rights reserved. Published by:

Background: The aim of this study is to evaluate the possible relationship between human papillomavirus (HPV) and malignant melanoma.
Objectives: In this systematic review we re-analysed the study of Roussaki-Schulze et al. and the study of La Placa et al. so that some new inferences can be drawn.
Materials and methods: Roussaki-Schulze et al. obtained data from 28 human mel-anoma biopsy specimens and from 6 healthy individuals. La Placa et al. investigated 51 primary melanoma (PM) and in 20 control skin samples. The HPV DNA was de-termined by polymerase chain reaction (PCR).
Statistical Analysis: The method of the conditio per quam relationship was used to proof the hypothesis whether the presence of human papillomavirus (HPV) guarantees the presence of malignant melanoma. In other words, if human papillomavirus (HPV) is present, then malignant melanoma is present too. The mathematical formula of the causal relationship k was used to proof the hypothesis, whether there is a cause effect relationship between human papillomavirus (HPV) and malignant melanoma. Signifi-cance was indicated by a p-value of less than 0.05.
Results: Based on the data as published by Roussaki-Schulze et al. and the data of La Placa et al. the presence of human papillomavirus (HPV) guarantees the presence of malignant melanoma. In other words, human papillomavirus (HPV) is a conditio per quam of malignant melanoma. In contrast to the study of La Placa et al. and contrary to expectation, the study of Roussaki-Schulze et al. which is based on a very small sample size failed to provide evidence of a significant cause effect relationship be-tween human papillomavirus (HPV) and malignant melanoma.
Conclusions: Human papillomavirus (HPV) is a necessary condition of malignant melanoma. Human papillomavirus (HPV) is a cause of malignant melanoma.

**Category:** Statistics

[88] **viXra:1710.0311 [pdf]**
*replaced on 2017-10-29 17:30:32*

**Authors:** Ilija Barukčić

**Comments:** 13 pages. Copyright © 2017 by Ilija Barukčić, Jever, Germany. All rights reserved. Published by:

Background: The aim of this study is to evaluate the possible relationship between human papillomavirus (HPV) and malignant melanoma.
Objectives: In this systematic review we re-analysed the study of Roussaki-Schulze et al. and La Placa et al. so that some new inferences can be drawn.
Materials and methods: Roussaki-Schulze et al. obtained data from 28 human mel-anoma biopsy specimens and from 6 healthy individuals. La Placa et al. investigated 51 primary melanoma (PM) and in 20 control skin samples. The HPV DNA was de-termined by polymerase chain reaction (PCR).
Statistical Analysis: The method of the conditio per quam relationship was used to proof the hypothesis whether the presence of human papillomavirus (HPV) guarantees the presence of malignant melanoma. In other words, if human papillomavirus (HPV) is present, then malignant melanoma must be present too. The mathematical formula of the causal relationship k was used to proof the hypothesis, whether there is a cause effect relationship between human papillomavirus (HPV) and malignant melanoma. Significance was indicated by a p-value of less than 0.05.
Results: Based on the data as published by Roussaki-Schulze et al. and La Placa et al. the presence of human papillomavirus (HPV) guarantees the presence of malignant melanoma. In other words, human papillomavirus (HPV) is a conditio per quam of malignant melanoma. In contrast to the study of La Placa et al. and contrary to ex-pectation, the study of Roussaki-Schulze et al. which is based on a very small sample size failed to provide a significant cause effect relationship between human papillo-mavirus (HPV) and malignant melanoma.
Conclusions: Human papillomavirus (HPV) is a necessary condition of malignant melanoma. Human papillomavirus (HPV) is a cause of malignant melanoma.

**Category:** Statistics

[87] **viXra:1710.0261 [pdf]**
*replaced on 2017-12-01 10:20:35*

**Authors:** Russell Leidich

**Comments:** 20 Pages.

The Jensen-Shannon divergence (JSD) quantifies the “information distance” between a pair of probability distributions. (A more generalized version, which is beyond the scope of this paper, is given in [1]. It extends this divergence to arbitrarily many such distributions. Related divergences are presented in [2], which is an excellent summary of existing work.)

A couple of novel applications for this divergence are presented herein, both of which involving sets of whole numbers constrained by some nonzero maximum value. (We’re primarily concerned with discrete applications of the JSD, although it’s defined for analog variables.) The first of these, which we can call the “Jensen-Shannon divergence transform” (JSDT), involves a sliding “sweep window” whose JSD with respect to some fixed “needle” is evaluated at each step as said window moves from left to right across a superset called a “haystack”.

The second such application, which we can call the “Jensen-Shannon exodivergence transform” (JSET), measures the JSD between a sweep window and an “exosweep”, that is, the haystack minus said window, at all possible locations of the latter. The JSET turns out to be exceptionally good at detecting anomalous contiguous subsets of a larger set of whole numbers.

We then investigate and attempt to improve upon the shortcomings of the JSD and the related Kullback-Leibler divergence (KLD).

[86] **viXra:1710.0029 [pdf]**
*replaced on 2018-01-05 07:06:47*

**Authors:** Feng Zhang, Jianjun Wang, Yao Wang, Jianwen Huang

**Comments:** 28 Pages.

In the era of big data, the multi-modal data can be seen everywhere. Research
on such data has attracted extensive attention in the past few years. In this paper,
we investigate perturbations of compressed data separation with redundant
tight frames via ˜Φ-ℓq-minimization. By exploiting the properties of the redundant
tight frame and the perturbation matrix, i.e., mutual coherence, null space
property and restricted isometry property, the condition on reconstruction of
sparse signal with redundant tight frames is established and the error estimation
between the local optimal solution and the original signal is also provided. Numerical
experiments are carried out to show that ˜Φ-ℓq-minimization are robust
and stable for the reconstruction of sparse signal with redundant tight frames.
To our knowledge, our works may be the first study concerning perturbations
of the measurement matrix and the redundant tight frame for compressed data
separation.

**Category:** Statistics

[85] **viXra:1709.0359 [pdf]**
*replaced on 2017-09-26 21:38:43*

**Authors:** Jianwen Huang, Jianjun Wang, Wendong Wang

**Comments:** 15 Pages.

This work gains a sharp sufficient condition on the block restricted isometry property for the recovery of the sparse signal. Under the assumption, the sparse with block structure can be stably recovered in the present of noisy case and the block sparse signal can be assuredly reconstructed in the noise-free case. Besides, in order to exhibit the condition is sharp, we offer an example. Byproduct, as $t=1$, the result enhances the bound of block restricted isometry constant $\delta_{s|\mathcal{I}}$ in Lin and Li (Acta Math. Sin. Engl. Ser. 29(7): 1401-1412, 2013).

**Category:** Statistics

[84] **viXra:1706.0425 [pdf]**
*replaced on 2018-02-25 12:10:39*

**Authors:** Yuri Heymann

**Comments:** 15 Pages.

The motivation of this study is to investigate new methods for the calculation of the moment-generating function of the lognormal distribution. Taylor expansion method on the moments of the lognormal suffers from divergence issues, saddle-point approximation is not exact, and integration methods can be complicated. In the present paper we introduce a new probability measure that we refer to as the star probability measure as an alternative approach to compute the moment-generating function of normal variate functionals such as the lognormal distribution.

**Category:** Statistics

[83] **viXra:1706.0425 [pdf]**
*replaced on 2018-02-11 12:58:23*

**Authors:** Yuri Heymann

**Comments:** 15 Pages.

The motivation of this study is to investigate new methods for the calculation of the moment-generating function of the lognormal distribution. Taylor expansion method on the moments of the lognormal suffers from divergence issues, saddle-point approximation is not exact, and integration methods can be complicated. In the present paper we introduce a new probability measure that we refer to as the star probability measure as an alternative approach to compute the moment-generating function of normal variate functionals such as the lognormal distribution.

**Category:** Statistics

[82] **viXra:1706.0425 [pdf]**
*replaced on 2018-01-18 17:40:27*

**Authors:** Yuri Heymann

**Comments:** 16 Pages.

The motivation of this study is to investigate new methods for the calculation of the moment-generating function of the lognormal distribution. Taylor expansion method on the moments of the lognormal suffers from divergence issues, saddle-point approximation is not exact, and integration methods can be complicated. In the present paper we introduce a new probability measure that we refer to as the star probability measure as an alternative approach to compute the moment-generating function of normal variate functionals such as the lognormal distribution.

**Category:** Statistics

[81] **viXra:1706.0279 [pdf]**
*replaced on 2017-08-28 09:11:45*

**Authors:** Raymond H.V. Gallucci

**Comments:** 10 Pages.

Since publication of NUREG/CR-6850 (EPRI 1011989), EPRI/NRC-RES Fire PRA Methodology for Nuclear Power Facilities in 2005, phenomenological modeling of fire growth to peak heat release rate (HRR) for electrical enclosure fires in nuclear power plant probabilistic risk assessment (PRA) has typically assumed an average 12-minute rise time. [1] One previous analysis using the data from NUREG/CR-6850 from which this estimate derived (Gallucci, “Statistical Characterization of Cable Electrical Failure Temperatures Due to Fire, with Simulation of Failure Probabilities”) indicated that the time to peak HRR could be represented by a gamma distribution with alpha (shape) and beta (scale) parameters of 8.66 and 1.31, respectively. [2] Completion of the test program by the US Nuclear Regulatory Commission (USNRC) for electrical enclosure heat release rates, documented in NUREG/CR-7197, Heat Release Rates of Electrical Enclosure Fires (HELEN-FIRE) in 2016, has provided substantially more data from which to characterize this growth time to peak HRR. [3] From these, the author develops probabilistic distributions that enhance the original NUREG/CR-6850 results for both qualified (Q) and unqualified cables (UQ). The mean times to peak HRR are 13.3 and 10.1 min for Q and UQ cables, respectively, with a mean of 12.4 min when all data are combined, confirming that the original NUREG/CR-6850 estimate of 12 min was quite reasonable.
Via statistical-probabilistic analysis, the author shows that the time to peak HRR for Q and UQ cables can again be well represented by gamma distributions with alpha and beta parameters of 1.88 and 7.07, and 3.86 and 2.62, respectively. Working with the gamma distribution for All cables given the two cable types, the author performs simulations demonstrating that manual non-suppression probabilities, on average, are 30% and 10% higher than the use of a 12-min point estimate when the fire is assumed to be detected at its start and halfway between its start and the time it reaches its peak, respectively. This suggests that adopting a probabilistic approach enables more realistic modeling of this particular fire phenomenon (growth time).

**Category:** Statistics

[80] **viXra:1705.0093 [pdf]**
*replaced on 2017-07-05 00:57:58*

**Authors:** L. Martino

**Comments:** IET Electronics Letters, Volume 53, Issue 16, Pages: 1115-1117, 2017

Monte Carlo (MC) methods have become very popular in signal processing during the past decades. The adaptive rejection sampling (ARS) algorithms are well-known MC technique which draw efficiently independent samples from univariate target densities. The ARS schemes yield a sequence of proposal functions that converge toward the target, so that the probability of accepting a sample approaches one. However, sampling from the proposal pdf becomes more computationally demanding each time it is updated. We propose the Parsimonious Adaptive Rejection Sampling (PARS) method, where an efficient trade-off between acceptance rate and proposal complexity is obtained. Thus, the resulting algorithm is faster than the standard ARS approach.

**Category:** Statistics

[79] **viXra:1705.0093 [pdf]**
*replaced on 2017-05-04 14:04:31*

**Authors:** L. Martino

**Comments:** 7 Pages.

Monte Carlo (MC) methods have become very popular in signal processing during the past decades. The adaptive rejection sampling (ARS) algorithms are well-known MC technique which draw efficiently independent samples from univariate target densities. The ARS schemes yield a sequence of proposal functions that converge toward the target, so that the probability of accepting a sample approaches one. However, sampling from the proposal pdf becomes more computationally demanding each time it is updated. We propose the Parsimonious Adaptive Rejection Sampling (PARS) method, where an efficient trade-off between acceptance rate and proposal complexity is obtained. Thus, the resulting algorithm is faster than the standard ARS approach.

**Category:** Statistics

[78] **viXra:1704.0277 [pdf]**
*replaced on 2017-06-02 14:03:25*

**Authors:** Zhicheng Chen

**Comments:** 4 Pages.

Distributions play a very important role in many applications. Inspired by the newly developed warping transformation of distributions, an indirect nonparametric distribution to distribution regression method is proposed in this article for distribution prediction. Additionally, a hybrid approach by fusing the predictions respectively obtained by the proposed method and the conventional method is further developed for reducing risk when the predictor is contaminated.

**Category:** Statistics

[77] **viXra:1704.0277 [pdf]**
*replaced on 2017-05-12 21:34:09*

**Authors:** Zhicheng Chen

**Comments:** 4 Pages.

Distributions play a very important role in many applications. Inspired by the newly developed warping transformation of distributions, an indirect nonparametric distribution to distribution regression method is proposed in this article for predicting correlated one-dimensional continuous probability density functions.

**Category:** Statistics

[76] **viXra:1704.0247 [pdf]**
*replaced on 2017-04-24 10:16:46*

**Authors:** Zhicheng Chen, Hui Li

**Comments:** 8 Pages.

In Structural Health Monitoring, there are usually many strain sensors installed in different places of a single structure. The raw measurement of a strain sensor is generally a mixed response caused by different excitations such as moving vehicle loads, ambient temperature, etc. Monitoring data collected by different strain sensors are usually correlated with each other, correlation structures of responses caused by different excitations for different sensor pairs are quite diverse and complex. In Structural Health Monitoring, quantitatively describing and modeling complicated dependence structures of strain data is very important in many applications. In this article, copulas are exploited to characterize dependence structures and construct joint distributions of monitoring strain data. The constructed joint distribution is also applied in missing data imputation.

**Category:** Statistics

[75] **viXra:1704.0063 [pdf]**
*replaced on 2017-04-28 15:44:45*

**Authors:** L. Martino, V. Elvira, G. Camps-Valls

**Comments:** 39 Pages.

Importance Sampling (IS) is a well-known Monte Carlo technique that approximates integrals involving a posterior distribution by means of weighted samples. In this work, we study the assignation of a single weighted sample which compresses the information contained in a population of weighted samples. Part of the theory that we present as Group Importance Sampling (GIS) has been employed implicitly in different works in the literature. The provided analysis yields several theoretical and practical consequences. For instance, we discuss the application of GIS into the Sequential Importance Resampling framework and show that Independent Multiple Try Metropolis schemes can be interpreted as a standard Metropolis-Hastings algorithm, following the GIS approach. We also introduce two novel Markov Chain Monte Carlo techniques based on GIS. The first one, named Group Metropolis Sampling method, produces a Markov chain of sets of weighted samples. All these sets are then employed for obtaining a unique global estimator. The second one is the Distributed Particle Metropolis-Hastings technique, where different parallel particle filters are jointly used to drive an MCMC algorithm. Different resampled trajectories are compared and then tested with a proper acceptance probability. The novel schemes are tested in different numerical experiments such as learning the hyperparameters of Gaussian Processes, the localization problem in a wireless sensor network and the tracking of vegetation parameters given satellite observations, where they are compared with several benchmark Monte Carlo techniques. Three illustrative Matlab demos are also provided.

**Category:** Statistics

[74] **viXra:1704.0063 [pdf]**
*replaced on 2017-04-17 08:30:11*

**Authors:** L. Martino, V. Elvira, G. Camps-Valls

**Comments:** 39 Pages. Related Matlab demos at https://github.com/lukafree/GIS.git

Importance Sampling (IS) is a well-known Monte Carlo technique that approximates integrals involving a posterior distribution by means of weighted samples. In this work, we study the assignation of a single weighted sample which compresses the information contained in a population of weighted samples. Part of the theory that we present as Group Importance Sampling (GIS) has been employed implicitly in different works in the literature. The provided analysis yields several theoretical and practical consequences. For instance, we discuss the application of GIS into the Sequential Importance Resampling framework and show that Independent Multiple Try Metropolis schemes can be interpreted as a standard Metropolis-Hastings algorithm, following the GIS approach. We also introduce two novel Markov Chain Monte Carlo techniques based on GIS. The first one, named Group Metropolis Sampling method, produces a Markov chain of sets of weighted samples. All these sets are then employed for obtaining a unique global estimator. The second one is the Distributed Particle Metropolis-Hastings technique, where different parallel particle filters are jointly used to drive an MCMC algorithm. Different resampled trajectories are compared and then tested with a proper acceptance probability. The novel schemes are tested in different numerical experiments such as learning the hyperparameters of Gaussian Processes, the localization problem in a wireless sensor network and the tracking of vegetation parameters given satellite observations, where they are compared with several benchmark Monte Carlo techniques. Three illustrative Matlab demos are also provided.

**Category:** Statistics

[73] **viXra:1704.0063 [pdf]**
*replaced on 2017-04-08 03:25:37*

**Authors:** L. Martino, V. Elvira, G. Camps-Valls

**Comments:** 32 Pages. Related Matlab demos at https://github.com/lukafree/GIS.git

Importance Sampling (IS) is a well-known Monte Carlo technique that approximates integrals involving a posterior distribution by means of weighted samples. In this work, we study the assignation of a single weighted sample which compresses the information contained in a population of weighted samples. Part of the theory that we present as Group Importance Sampling (GIS) has been already employed implicitly in different works in literature. The provided analysis yields several theoretical and practical consequences. For instance, we discuss the application of GIS into the Sequential Importance Resampling (SIR) framework and show that Independent Multiple Try Metropolis (I-MTM) schemes can be interpreted as a standard Metropolis-Hastings algorithm, following the GIS approach. We also introduce two novel Markov Chain Monte Carlo (MCMC) techniques based on GIS. The first one, named Group Metropolis Sampling (GMS) method, produces a Markov chain of sets of weighted samples. All these sets are then employed for obtaining a unique global estimator. The second one is the Distributed Particle Metropolis-Hastings (DPMH) technique, where different parallel particle filters are jointly used to drive an MCMC algorithm. Different resampled trajectories are compared and then tested with a proper acceptance probability. The novel schemes are tested in different numerical experiments such as learning the hyperparameters of Gaussian Processes (GP), the localization problem in a sensor network and the tracking of the Leaf Area Index (LAI), where they are compared with several benchmark Monte Carlo techniques. Three descriptive Matlab demos are also provided.

**Category:** Statistics

[72] **viXra:1703.0203 [pdf]**
*replaced on 2017-04-05 13:22:39*

**Authors:** Raymond HV Gallucci

**Comments:** 84 Pages.

Situational Underlying Value (SUV) arose from an attempt to develop an all-encompassing statistic for measuring “clutchiness” for individual baseball players. It was to be based on the “run expectancy” concept, whereby each base with a certain number of outs is “worth” some fraction of a run. Hitters/runners reaching these bases would acquire the “worth” of that base, with the “worth” being earned by the hitter if he reached a base or advanced a runner, or the runner himself if he advanced “on his own” (e.g., stolen base, wild pitch). After several iterations, the version for SUV Baseball presented herein evolved, and it is demonstrated via two games. Subsequently, the concept was extended to professional football and NCAA Men’s Basketball, both with two example games highlighting selected individual players. As with Major League Baseball, these are team games where individual performance may be hard to gauge with a single statistic. This is the goal of SUV, which can be used as a measure both for the team and individual players.

**Category:** Statistics

[71] **viXra:1701.0420 [pdf]**
*replaced on 2017-09-24 12:48:50*

**Authors:** Nikhil Shaw

**Comments:** 6 Pages.

This paper proposes an alternate approach to solve the selection problem and is comparable to best-known algorithm of Quickselect. In computer science, a selection algorithm is an algorithm for finding the Kth smallest number in an unordered list or array. Selection is a subproblem of more complex problems like the nearest neighbor and shortest path problems. Previous known approaches work on the same principle to optimize the sorting algorithm and return the Kth element. This algorithm uses window method to prune and compare numbers to find the Kth smallest element. The average time complexity of the algorithm is linear and has the worst case of O(n^2).

**Category:** Statistics

[70] **viXra:1609.0230 [pdf]**
*replaced on 2017-12-21 05:20:57*

**Authors:** L. Martino, V. Elvira, G. Camps-Valls

**Comments:** 30 Pages. published in Digital Signal Processing, Volume 74, 2018

Monte Carlo methods are essential tools for Bayesian inference. Gibbs sampling is a well-known Markov chain Monte Carlo (MCMC) algorithm, extensively used in signal processing, machine learning, and statistics, employed to draw samples from complicated high-dimensional posterior distributions. The key point for the successful application of the Gibbs sampler is the ability to draw efficiently samples from the full-conditional probability density functions. Since in the general case this is not possible, in order to speed up the convergence of the chain, it is required to generate auxiliary samples whose information is eventually disregarded. In this work, we show that these auxiliary samples can be recycled within the Gibbs estimators, improving their efficiency with no extra cost. This novel scheme arises naturally after pointing out the relationship between the standard Gibbs sampler and the chain rule used for sampling purposes. Numerical simulations involving simple and real inference problems confirm the excellent performance of the proposed scheme in terms of accuracy and computational efficiency. In particular we give empirical evidence of performance in a toy example, inference of Gaussian processes hyperparameters, and learning dependence graphs through regression.

**Category:** Statistics

[69] **viXra:1609.0230 [pdf]**
*replaced on 2017-12-20 08:13:53*

**Authors:** L. Martino, V. Elvira, G. Camps-Valls

**Comments:** 30 Pages. published in Digital Signal Processing, 2017

Monte Carlo methods are essential tools for Bayesian inference. Gibbs sampling is a well-known Markov chain Monte Carlo (MCMC) algorithm, extensively used in signal processing, machine learning, and statistics, employed to draw samples from complicated high-dimensional posterior distributions. The key point for the successful application of the Gibbs sampler is the ability to draw efficiently samples from the full-conditional probability density functions. Since in the general case this is not possible, in order to speed up the convergence of the chain, it is required to generate auxiliary samples whose information is eventually disregarded. In this work, we show that these auxiliary samples can be recycled within the Gibbs estimators, improving their efficiency with no extra cost. This novel scheme arises naturally after pointing out the relationship between the standard Gibbs sampler and the chain rule used for sampling purposes. Numerical simulations involving simple and real inference problems confirm the excellent performance of the proposed scheme in terms of accuracy and computational efficiency. In particular we give empirical evidence of performance in a toy example, inference of Gaussian processes hyperparameters, and learning dependence graphs through regression.

**Category:** Statistics

[68] **viXra:1609.0230 [pdf]**
*replaced on 2016-11-21 10:18:22*

**Authors:** L. Martino, V. Elvira, G. Camps-Valls

**Comments:** 26 Pages. The MATLAB code of the numerical examples is provided at http://isp.uv.es/code/RG.zip.

Monte Carlo methods are essential tools for Bayesian inference. Gibbs sampling is a well-known Markov chain Monte Carlo (MCMC) algorithm, extensively used in signal processing, machine learning, and statistics, employed to draw samples from complicated high-dimensional posterior distributions. The key point for the successful application of the Gibbs sampler is the ability to draw efficiently samples from the full-conditional probability density functions. Since in the general case this is not possible, in order to speed up the convergence of the chain, it is required to generate auxiliary samples whose information is eventually disregarded. In this work, we show that these auxiliary samples can be recycled within the Gibbs estimators, improving their efficiency with no extra cost. This novel scheme arises naturally after pointing out the relationship between the standard Gibbs sampler and the chain rule used for sampling purposes. Numerical simulations involving simple and real inference problems confirm the excellent performance of the proposed scheme in terms of accuracy and computational efficiency. In particular we give empirical evidence of performance in a toy example, inference of Gaussian processes hyperparameters, and learning dependence graphs through regression.

**Category:** Statistics

[67] **viXra:1608.0403 [pdf]**
*replaced on 2016-10-22 08:45:38*

**Authors:** Sascha Vongehr

**Comments:** 8 pages, 2 figures, 26 references

Ashkenazim Jews (AJ) comprise roughly 30% of Nobel Prize winners, ‘elite institute’ faculty, etc. Mean intelligence quotients (IQ) fail explaining this, because AJ are only 2.2% of the US population; the maximum possible would be 13% high achievement and needing IQs above 165. The growing anti-Semitic right wing supports conspiracy theories with this. However, standard deviations (SD) depend on means. An AJ-SD of 17 is still lower than the coefficient of variation suggests, but lifts the right wing of the AJ-IQ distribution sufficiently to account for high achievement. We do not assume threshold IQs or smart fractions. Alternative mechanisms such as intellectual AJ culture or ethnocentrism must be regarded as included through their IQ-dependence. Antisemitism is thus opposed in its own domain of discourse; it is an anti-intelligence position inconsistent with eugenics. We discuss the relevance for ‘social sciences’ as sciences and that human intelligence co-evolved for (self-)deception.

**Category:** Statistics

[66] **viXra:1606.0130 [pdf]**
*replaced on 2016-10-07 13:43:28*

**Authors:** Raymond H.V. Gallucci, Brian Metzger

**Comments:** 19 Pages. Replaces previous version

Since the publication of NUREG/CR-6850 / EPRI 1011989 in 2005, the US nuclear industry has sought to re-evaluate the default peak heat release rates (HRRs) for electrical enclosure fires typically used as fire modeling inputs to support fire probabilistic risk assessments (PRAs), considering them too conservative. HRRs are an integral part of the fire phenomenological modeling phase of a fire PRA, which consists of identifying fire scenarios which can damage equipment or hinder human actions necessary to prevent core damage. Fire ignition frequency, fire growth and propagation, fire detection and suppression, and mitigating equipment and actions to prevent core damage in the event fire damage still occurred are all parts of a fire PRA. The fire growth and propagation phase incorporates fire phenomenological modeling where HRRs have a key effect. A major effort by the Electric Power Research Institute and Science Applications International Corporation in 2012 was not endorsed by the US Nuclear Regulatory Commission (NRC) for use in risk-informed, regulatory applications. Subsequently the NRC, in conjunction with the National Institute of Standards and Technology, conducted a series of tests for representative nuclear power plant electrical enclosure fires designed to definitively establish more realistic peak HRRs for these often important contributors to fire risk. The results from these tests are statistically analyzed to develop two probabilistic distributions for peak HRR per unit mass of fuel that refine the values from NUREG/CR-6850, thereby providing a fairly simple means by which to estimate peak HRRs from electrical enclosure fires for fire modeling in support of fire PRA. Unlike NUREG/CR-6850, where five different distributions are provided, or NUREG-2178, which now provides 31, the peak HRRs for electrical enclosure fires can be characterized by only two distributions. These distributions depend only on the type of cable, namely qualified vs. unqualified, for which the mean peak HRR per unit mass is 11.3 and 23.2 kW/kg, respectively, essentially a factor of two difference. Two-sided, 90th percentile confidence bounds are 0.091 to 41.15 kW/kg for qualified cables, and 0.027 to 95.93 kW/kg for unqualified cables. From the mean (~70th percentile) upward, the peak HRR/kg for unqualified cables is roughly twice that that for qualified, increasing slightly with higher percentile, an expected phenomenological trend. Simulations using variable fuel loadings are performed to demonstrate how the results from this analysis may be used for nuclear power plant applications.

**Category:** Statistics

[65] **viXra:1604.0009 [pdf]**
*replaced on 2017-02-25 01:17:19*

**Authors:** Ioannis Koukoutsidis

**Comments:** 31 Pages. A short version of this article was published in Proceedings of SENSORNETS 2017 (February 2017). The final corrected version was published in ACM Transactions on Sensor Networks (TOSN), Vol. 14, Issue 1, Art. 2, December 2017

Mobile crowdsensing can facilitate environmental surveys by leveraging sensor-equipped mobile devices that carry out measurements covering a wide area in a short time, without bearing the costs of traditional field work. In this paper, we examine statistical methods to perform an accurate estimate of the mean value of an environmental parameter in a region, based on such measurements. The main focus is on estimates produced by considering the mobile device readings at a random instant in time. We compare stratified sampling with different stratification weights to sampling without stratification, as well as an appropriately modified version of systematic sampling. Our main result is that stratification with weights proportional to stratum areas can produce significantly smaller bias, and gets arbitrarily close to the true area average as the number of mobiles increases, for a moderate number of strata. The performance of the methods is evaluated for an application scenario where we estimate the mean area temperature in a linear region that exhibits the so-called *Urban Heat Island* effect, with mobile users moving in the region according to the Random Waypoint Model.

**Category:** Statistics

[64] **viXra:1603.0215 [pdf]**
*replaced on 2016-03-17 17:28:08*

**Authors:** Glenn Healey

**Comments:** 14 Pages.

Given a set of observed batted balls and their outcomes, we develop a method for learning the dependence of a batted ball’s intrinsic value on its measured parameters.

**Category:** Statistics

[63] **viXra:1603.0180 [pdf]**
*replaced on 2016-03-14 15:52:37*

**Authors:** Luca Martino, Jorge Plata-Chaves, Francisco Louzada

**Comments:** 5 Pages.

In this work, we design an efficient Monte Carlo scheme for a node-specific inference problem where a vector of global parameters and multiple vectors of local parameters are involved. This scenario often appears in inference problems over heterogeneous wireless sensor networks where each node performs observations dependent on a vector of global parameters as well as a vector of local parameters. The proposed scheme uses parallel local MCMC chains and then an importance sampling (IS) fusion step that leverages all the observations of all the nodes when estimating the global parameters. The resulting algorithm is simple and flexible. It can be easily applied iteratively, or extended in a sequential framework.

**Category:** Statistics

[62] **viXra:1603.0180 [pdf]**
*replaced on 2016-03-13 11:19:11*

**Authors:** Luca Martino, Jorge Plata-Chaves, Francisco Louzada

**Comments:** 5 Pages.

In this work, we design an efficient Monte Carlo scheme for a node-specific inference problem where a vector of global parameters and multiple vectors of local parameters are involved. This scenario often appears in inference problems over heterogeneous wireless sensor networks where each node performs observations dependent on a vector of global parameters as well as a vector of local parameters. The proposed scheme uses parallel local MCMC chains and then an importance sampling (IS) fusion step that leverages all the observations of all the nodes when estimating the global parameters. The resulting algorithm is simple and flexible. It can be easily applied iteratively, or extended in a sequential framework.

**Category:** Statistics

[61] **viXra:1603.0180 [pdf]**
*replaced on 2016-03-12 06:01:27*

**Authors:** Luca Martino, Jorge Plata-Chaves, Francisco Louzada

**Comments:** 5 Pages.

In this work, we design an efficient Monte Carlo
scheme for a node-specific inference problem where a vector of
global parameters and multiple vectors of local parameters are
involved. This scenario often appears in inference problems over
heterogeneous wireless sensor networks where each node performs observations dependent on a vector of global parameters as well as a vector of local parameters. The proposed scheme uses parallel local MCMC chains and then an importance sampling (IS) fusion step that leverages all the observations of all the nodes when estimating the global parameters. The resulting algorithm is simple and flexible. It can be easily applied iteratively, or extended in a sequential framework.

**Category:** Statistics

[60] **viXra:1602.0333 [pdf]**
*replaced on 2017-03-07 04:06:37*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 9 Pages.

The Sequential Importance Resampling (SIR) method is the core of the Sequential Monte Carlo (SMC) algorithms (a.k.a., particle filters). In this work, we point out a suitable choice for weighting properly a resampled particle. This observation entails several theoretical and practical consequences, allowing also the design of novel sampling schemes. Specifically, we describe one theoretical result about the sequential estimation of the marginal likelihood. Moreover, we suggest a novel resampling procedure for SMC algorithms called partial resampling, involving only a subset of the current cloud of particles. Clearly, this scheme attenuates the additional variance in the Monte Carlo estimators generated by the use of the resampling.

**Category:** Statistics

[59] **viXra:1602.0333 [pdf]**
*replaced on 2016-10-21 05:07:13*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 9 Pages.

The Sequential Importance Resampling (SIR) method is the core of the Sequential Monte Carlo (SMC) algorithms (a.k.a., particle filters). In this work, we point out a suitable choice for weighting properly a resampled particle. This observation entails several theoretical and practical consequences, allowing also the design of novel sampling schemes. Specifically, we describe one theoretical result about the sequential estimation of the marginal likelihood. Moreover, we suggest a novel resampling procedure for SMC algorithms called partial resampling, involving only a subset of the current cloud of particles. Clearly, this scheme attenuates the additional variance in the Monte Carlo estimators generated by the use of the resampling.

**Category:** Statistics

[58] **viXra:1602.0333 [pdf]**
*replaced on 2016-06-15 02:55:00*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 9 Pages. This is an extended version of the work: L. Martino,V. Elvira, F. Louzada, "Weighting a Resampled Particle in Sequential Monte Carlo", IEEE Statistical Signal Processing Workshop, (SSP), 2016.

**Category:** Statistics

[57] **viXra:1602.0333 [pdf]**
*replaced on 2016-06-13 04:06:23*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 9 Pages. This is an extended version of the work: L. Martino,V. Elvira, F. Louzada, "Weighting a Resampled Particle in Sequential Monte Carlo", IEEE Statistical Signal Processing Workshop, (SSP), 2016.

**Category:** Statistics

[56] **viXra:1602.0333 [pdf]**
*replaced on 2016-05-10 08:15:27*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 5 Pages.

**Category:** Statistics

[55] **viXra:1602.0112 [pdf]**
*replaced on 2016-09-23 03:15:35*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** Signal Processing, Volume 131, Pages: 386-401, 2017

The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In the IS context, an approximation $\widehat{ESS}$ of the theoretical ESS definition is widely applied, involving the inverse of the sum of the squares of the normalized importance weights. This formula, $\widehat{ESS}$, has become an essential piece within Sequential Monte Carlo (SMC) methods, to assess the convenience of a resampling step. From another perspective, the expression $\widehat{ESS}$ is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these two pmfs. Several examples are provided involving, for instance, the geometric mean of the weights, the discrete entropy (including the {\it perplexity} measure, already proposed in literature) and the Gini coefficient among others. We list five theoretical requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.

**Category:** Statistics

[54] **viXra:1602.0112 [pdf]**
*replaced on 2016-03-05 09:11:03*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 32 Pages.

The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In the IS context, an approximation $\widehat{ESS}$ of the theoretical ESS definition is widely applied, involving the inverse of the sum of the squares of the normalized importance weights. This formula, $\widehat{ESS}$, has become an essential piece within Sequential Monte Carlo (SMC) methods, to assess the convenience of a resampling step. From another perspective, the expression $\widehat{ESS}$ is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these two pmfs. Several examples are provided involving, for instance, the geometric and harmonic means of the weights, the discrete entropy (including the perplexity measure, already proposed in literature) and the Gini coefficient among others. We list five requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.

**Category:** Statistics

[53] **viXra:1602.0112 [pdf]**
*replaced on 2016-02-20 06:30:34*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 31 Pages.

The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In the IS context, an approximation $\widehat{ESS}$ of the theoretical ESS definition is widely applied, involving the inverse of the sum of the squares of the normalized importance weights. This formula, $\widehat{ESS}$, has become an essential piece within Sequential Monte Carlo (SMC) methods, to assess the convenience of a resampling step. From another perspective, the expression $\widehat{ESS}$ is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these two pmfs. Several examples are provided involving, for instance, the geometric and harmonic means of the weights, the discrete entropy (including the perplexity measure, already proposed in literature) and the Gini coefficient among others. We list five requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.

**Category:** Statistics

[52] **viXra:1602.0112 [pdf]**
*replaced on 2016-02-19 04:23:27*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 31 Pages.

The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In the IS context, an approximation $\widehat{ESS}$ of the theoretical ESS definition is widely applied, involving the sum of the squares of the normalized importance weights. This formula, $\widehat{ESS}$, has become an essential piece within Sequential Monte Carlo (SMC) methods, to assess the convenience of a resampling step. From another perspective, the expression $\widehat{ESS}$ is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these two pmfs. Several examples are provided involving, for instance, the geometric and harmonic means of the weights, the discrete entropy (including the perplexity measure, already proposed in literature) and the Gini coefficient among others. We list five requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.

**Category:** Statistics

[51] **viXra:1602.0112 [pdf]**
*replaced on 2016-02-14 08:13:03*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 31 Pages.

The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In the IS context, an approximation $\widehat{ESS}$ of the theoretical ESS definition is widely applied, involving the sum of the squares of the normalized importance weights. This formula, $\widehat{ESS}$, has become an essential piece within Sequential Monte Carlo (SMC) methods, to assess the convenience of a resampling step. From another perspective, the expression $\widehat{ESS}$ is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these two pmfs. Several examples are provided involving, for instance, the geometric and harmonic means of the weights, the discrete entropy (including the {\it perplexity} measure, already proposed in literature) and the Gini coefficient among others. We list five requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.

**Category:** Statistics

[50] **viXra:1602.0112 [pdf]**
*replaced on 2016-02-10 07:48:50*

**Authors:** L. Martino, V. Elvira, F. Louzada

**Comments:** 31 Pages.

The Effective Sample Size (ESS) is an important measure of efficiency of Monte Carlo methods such as Markov Chain Monte Carlo (MCMC) and Importance Sampling (IS) techniques. In IS context, an approximation of the theoretical ESS definition is widely applied, $\widehat{ESS}$, involving the sum of the squares of the normalized importance weights. This formula $\widehat{ESS}$ has become an essential piece within Sequential Monte Carlo (SMC) methods using adaptive resampling procedures. The expression $\widehat{ESS}$ is related to the Euclidean distance between the probability mass described by the normalized weights and the discrete uniform probability mass function (pmf). In this work, we derive other possible ESS functions based on different discrepancy measures between these pmfs. Several examples are provided involving, for instance, the geometric and harmonic means of the weights, the discrete entropy (including the perplexity measure, already proposed in literature) and the Gini coefficient. We list five requirements which a generic ESS function should satisfy, allowing us to classify different ESS measures. We also compare the most promising ones by means of numerical simulations.

**Category:** Statistics

[49] **viXra:1602.0053 [pdf]**
*replaced on 2016-02-05 08:42:31*

**Authors:** Jason Lind

**Comments:** 3 Pages. Added preliminary calculations for correcting non-normal distribution

Defines a rated set and uses it to calculated a weight directly from the statistics that enabled broad unified interpretation of data.

**Category:** Statistics

[48] **viXra:1602.0053 [pdf]**
*replaced on 2016-02-05 03:29:44*

**Authors:** Jason Lind

**Comments:** Corrected table on page 2

Defines a rated set and uses it to calculated a weight directly from the statistics that enabled broad unified interpretation of data.

**Category:** Statistics

[47] **viXra:1601.0174 [pdf]**
*replaced on 2016-07-15 02:12:10*

**Authors:** V. Elvira, L. Martino, D. Luengo, M. F. Bugallo

**Comments:** 30 Pages.

Population Monte Carlo (PMC) sampling methods are powerful tools for approximating distributions of static unknowns given a set of observations. These methods are iterative in nature: at each step they generate samples from a proposal distribution and assign them weights according to the importance sampling principle. Critical issues in applying PMC methods are the choice of the generating functions for the samples and the avoidance of the sample degeneracy. In this paper, we propose three new schemes that considerably improve the performance of the original PMC formulation by allowing for better exploration of the space of unknowns and by selecting more adequately the surviving samples. A theoretical analysis is performed, proving the superiority of the novel schemes in terms of variance of the associated estimators and preservation of the sample diversity. Furthermore, we show that they outperform other state of the art algorithms (both in terms of mean square error and robustness w.r.t. initialization) through extensive numerical simulations.

**Category:** Statistics

[46] **viXra:1601.0167 [pdf]**
*replaced on 2017-11-05 10:44:50*

**Authors:** Ilija Barukčić

**Comments:** 10 pages. Copyright © 2006 by Ilija Barukčić, Jever, Germany. All rights reserved. Published by:

Titans like Bertrand Russell or Karl Pearson warned us to keep our mathematical and statistical hands off causality and at the end David Hume too. Hume's scepticism has dominated discussion of causality in both analytic philosophy and statistical analysis for a long time. But more and more researchers are working hard on this field and trying to get rid of this positions. In so far, much of the recent philosophical or mathematical writing on causation (Ellery Eells (1991), Daniel Hausman (1998), Pearl (2000), Peter Spirtes, Clark Glymour and Richard Scheines (2000), ...) either addresses to Bayes networks, to the counterfactual approach to causality developed in detail by David Lewis, to Reichenbach's Principle of the Common Cause or to the Causal Markov Condition. None of this approaches to causation investigated the relationship between causation and the law of independence to a necessary extent. Nonetheless, the relationship between causation and the law of independence, one of the fundamental concepts in probability theory, is very important. May an effect occur in the absence of a cause? May an effect fail to occur in the presence of a cause? In so far, what does constitute the causal relation? On the other hand, if it is unclear what does constitute the causal relation, maybe we can answer the question, what does not constitute the causal relation. So far, a cause as such can not be independent from its effect and vice versa, if there is a deterministic causal relationship. This publication will prove, that the law of independence defines causation to some extent ex negativo.

**Category:** Statistics

[45] **viXra:1512.0420 [pdf]**
*replaced on 2016-09-23 03:50:26*

**Authors:** L. Martino, J. Read, V. Elvira, F. Louzada

**Comments:** 30 Pages. (accepted; to appear) Digital Signal Processing

We design a sequential Monte Carlo scheme for the dual purpose of Bayesian inference and model selection. We consider the application context of urban mobility, where several modalities of transport and different measurement devices can be employed. Therefore, we address the joint problem of online tracking and detection of the current modality. For this purpose, we use interacting parallel particle filters, each one addressing a different model. They cooperate for providing a global estimator of the variable of interest and, at the same time, an approximation of the posterior density of each model given the data. The interaction occurs by a parsimonious distribution of the computational effort, with online adaptation for the number of particles of each filter according to the posterior probability of the corresponding model. The resulting scheme is simple and flexible. We have tested the novel technique in different numerical experiments with artificial and real data, which confirm the robustness of the proposed scheme.

**Category:** Statistics

[44] **viXra:1512.0420 [pdf]**
*replaced on 2015-12-26 13:02:26*

**Authors:** L. Martino, J. Read, V. Elvira, F. Louzada

**Comments:** 21 Pages.

We design a sequential Monte Carlo scheme for the joint purpose of Bayesian inference and model selection, with application to urban mobility context where different modalities of transport and measurement devices can be employed. In this case, we have the joint problem of online tracking and detection of the current modality. For this purpose, we use interacting parallel particle filters each one addressing a different model. They cooperate for providing a global estimator of the variable of interest and, at the same time, an approximation of the posterior density of the models given the data. The interaction occurs by a parsimonious distribution of the computational effort, adapting on-line the number of particles of each filter according to the posterior probability of the corresponding model. The resulting scheme is simple and flexible. We have tested the novel technique in different numerical experiments with artificial and real data, which confirm the robustness of the proposed scheme.

**Category:** Statistics

[43] **viXra:1509.0048 [pdf]**
*replaced on 2017-10-08 05:28:48*

**Authors:** L. Martino, F. Louzada

**Comments:** 15 Pages. (to appear) Communications in Statistics - Simulation and Computation

The adaptive rejection sampling (ARS) algorithm is a universal
random generator for drawing samples eciently from a univariate
log-concave target probability density function (pdf). ARS generates independent samples from the target via rejection sampling with high acceptance rates. Indeed, ARS yields a sequence of proposal functions that converge toward the target pdf, so that the probability of accepting a sample approaches one. However, sampling from the proposal pdf becomes more computational demanding each time it is updated. In this work, we propose a novel ARS scheme, called Cheap Adaptive Rejection Sampling (CARS), where the computational effort for drawing from the proposal remains constant, decided in advance by the user. For generating a large number of desired samples, CARS is faster than ARS.

**Category:** Statistics

[42] **viXra:1508.0142 [pdf]**
*replaced on 2016-02-24 08:21:59*

**Authors:** L. Martino, F. Louzada

**Comments:** 15 Pages. To appear in Computational Statistics

The multiple Try Metropolis (MTM) algorithm is an advanced MCMC technique based on drawing and testing several candidates at each iteration of the algorithm. One of them is selected according to certain weights and then it is tested according to a suitable acceptance probability. Clearly, since the computational cost increases as the employed number of tries grows, one expects that the performance of an MTM scheme improves as the number of tries increases, as well. However, there are scenarios where the increase of number of tries does not produce a corresponding enhancement of the performance. In this work, we describe these scenarios and then we introduce possible solutions for solving these issues.

**Category:** Statistics

[41] **viXra:1508.0142 [pdf]**
*replaced on 2015-08-19 03:39:57*

**Authors:** L. Martino, F. Louzada

**Comments:** 17 Pages.

The multiple Try Metropolis (MTM) algorithm is an advanced MCMC technique based on drawing and testing several candidates at each iteration of the algorithm. One of them is selected according to certain weights and then it is tested according to a suitable acceptance probability. Clearly, since the computational cost increases as the employed number of tries grows, one expects that the performance of an MTM scheme improves as the number of tries increases, as well. However, there are scenarios where the increase of number of tries does not produce a corresponding enhancement of the performance. In this work, we describe these scenarios and then we introduce possible solutions for solving these issues.

**Category:** Statistics

[40] **viXra:1507.0110 [pdf]**
*replaced on 2016-09-23 04:05:02*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander, F. Louzada

**Comments:** Digital Signal Processing Volume 58, Pages: 64-84, 2016.

Monte Carlo (MC) methods are widely used for Bayesian inference and optimization in statistics, signal processing and machine learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called {\it orthogonal MCMC} (O-MCMC), where a set of ``vertical'' parallel MCMC chains share information using some "horizontal" MCMC techniques working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations of O-MCMC, novel schemes in order to reduce the overall computational cost of parallel multiple try Metropolis (MTM) chains are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel simulated annealing (SA) algorithms. Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the estimation, as well as robustness in terms of independence with respect to initial values and the choice of the parameters.

**Category:** Statistics

[39] **viXra:1507.0110 [pdf]**
*replaced on 2015-07-30 08:34:32*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander, F. Louzada

**Comments:** 25 Pages.

Monte Carlo (MC) methods are widely used in statistics, signal processing and machine learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called orthogonal MCMC (O-MCMC), where a set of ``vertical'' parallel MCMC chains share information using some "horizontal" MCMC techniques working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations of O-MCMC, novel schemes for reducing the overall computational cost of parallel multiple try Metropolis (MTM) chains are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel simulated annealing (SA) algorithms. We also discuss the application of O-MCMC in a big bata framework. Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the estimation, as well as robustness in terms of independence with respect to initial values and parameter choice.

**Category:** Statistics

[38] **viXra:1507.0110 [pdf]**
*replaced on 2015-07-28 23:03:29*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander, F. Louzada

**Comments:** 24 Pages.

Monte Carlo (MC) methods are widely used in statistics, signal processing and machine learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called orthogonal MCMC (O-MCMC), where a set of ``vertical'' parallel MCMC chains share information using some "horizontal" MCMC techniques working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations of O-MCMC, novel schemes for reducing the overall computational cost of parallel multiple try Metropolis (MTM) chains are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel simulated annealing (SA) algorithms. We also discuss the application of O-MCMC in a big bata framework. Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the estimation, as well as robustness in terms of independence with respect to initial values and parameter choice.

**Category:** Statistics

[37] **viXra:1507.0110 [pdf]**
*replaced on 2015-07-28 08:47:05*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander, F. Louzada

**Comments:** 24 Pages.

Monte Carlo (MC) methods are widely used in statistics, signal processing and machine learning. A well-known class of MC methods are Markov Chain Monte Carlo (MCMC) algorithms. In order to foster better exploration of the state space, specially in high-dimensional applications, several schemes employing multiple parallel MCMC chains have been recently introduced. In this work, we describe a novel parallel interacting MCMC scheme, called orthogonal MCMC (O-MCMC), where a set of ``vertical'' parallel MCMC chains share information using some "horizontal" MCMC techniques working on the entire population of current states. More specifically, the vertical chains are led by random-walk proposals, whereas the horizontal MCMC techniques employ independent proposals, thus allowing an efficient combination of global exploration and local approximation. The interaction is contained in these horizontal iterations. Within the analysis of different implementations of O-MCMC, novel schemes for reducing the overall computational cost of parallel multiple try Metropolis (MTM) chains are also presented. Furthermore, a modified version of O-MCMC for optimization is provided by considering parallel simulated annealing (SA) algorithms. We also discuss the application of O-MCMC in a big bata framework.
Numerical results show the advantages of the proposed sampling scheme in terms of efficiency in the estimation, as well as robustness in terms of independence with respect to initial values and parameter choice.

**Category:** Statistics

[36] **viXra:1506.0175 [pdf]**
*replaced on 2015-10-04 03:38:05*

**Authors:** Ilija Barukčić

**Comments:** 19 Pages. (C) Ilija Barukčić, Jever, Germany, 2015. Published by: International Journal of Applied Physics and Mathematics vol. 6, no. 2, pp. 45-65, 2016. http://dx.doi.org/10.17706/ijapm.2016.6.2.45-65

The deterministic relationship between cause and effect is deeply connected with our understanding of the physical sciences and their explanatory ambitions. Though progress is being made, the lack of theoretical predictions and experiments in quantum gravity makes it difficult to use empirical evidence to justify a theory of causality at quantum level in normal circumstances, i. e. by predicting the value of a well-confirmed experimental result. For a variety of reasons, the problem of the deterministic relationship between cause and effect is related to basic problems of physics as such. Despite the common belief, it is a remarkable fact that a theory of causality should be consistent with a theory of everything and is because of this linked to problems of a theory of everything. Thus far, solving the problem of causality can help to solve the problems of the theory of everything (at quantum level) too.

**Category:** Statistics

[35] **viXra:1505.0135 [pdf]**
*replaced on 2016-02-25 06:00:34*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander

**Comments:** 24 Pages.

Monte Carlo methods represent the \textit{de facto} standard for approximating complicated integrals involving multidimensional target distributions. In order to generate random realizations from the target distribution, Monte Carlo techniques use simpler proposal probability densities to draw candidate samples. The performance of any such method is strictly related to the specification of the proposal distribution, such that unfortunate choices easily wreak havoc on the resulting estimators. In this work, we introduce a \textit{layered} (i.e., hierarchical) procedure to generate samples employed within a Monte Carlo scheme. This approach ensures that an appropriate equivalent proposal density is always obtained automatically (thus eliminating the risk of a catastrophic performance), although at the expense of a moderate increase in the complexity. Furthermore, we provide a general unified importance sampling (IS) framework, where multiple proposal densities are employed and several IS schemes are introduced by applying the so-called deterministic mixture approach. Finally, given these schemes, we also propose a novel class of adaptive importance samplers using a population of proposals, where the adaptation is driven by independent parallel or interacting Markov Chain Monte Carlo (MCMC) chains. The resulting algorithms efficiently combine the benefits of both IS and MCMC methods.

**Category:** Statistics

[34] **viXra:1505.0135 [pdf]**
*replaced on 2015-05-27 13:09:35*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander

**Comments:** 25 Pages.

Monte Carlo methods represent the de facto standard for approximating complicated integrals involving multidimensional target distributions. In order to generate random realizations from the target distribution, Monte Carlo techniques use simpler proposal probability densities for drawing candidate samples. Performance of any such method is strictly related to the specification of the proposal distribution, such that unfortunate choices easily wreak havoc on the resulting estimators. In this work, we introduce a layered, that is a hierarchical, procedure for generating samples employed within a Monte Carlo scheme. This approach ensures that an appropriate equivalent proposal density is always obtained automatically (thus eliminating the risk of a catastrophic performance), although at the expense of a moderate increase in the complexity. A hierarchical interpretation of two well-known methods, such as of
the random walk Metropolis-Hastings (MH) and the Population Monte Carlo (PMC) techniques, is provided. Furthermore, we provide a general unified importance sampling (IS) framework where multiple proposal densities are employed, and several IS schemes are introduced applying the so-called deterministic mixture approach. Finally, given these schemes, we also propose a novel class of adaptive importance samplers using a population of proposals, where the adaptation is driven by independent parallel or interacting Markov Chain Monte Carlo (MCMC) chains. The resulting algorithms combine efficiently the benefits of both IS and MCMC methods.

**Category:** Statistics

[33] **viXra:1503.0088 [pdf]**
*replaced on 2016-06-13 09:15:01*

**Authors:** Jianwen Huang, Jianjun Wang, Guowang Luo

**Comments:** 15 Pages.

We introduce logarithmic generalized Maxwell
distribution which is an extension of the generalized Maxwell
distribution. Some interesting properties of this distribution are
studied and the asymptotic distribution of the partial maximum of an
independent and identically distributed sequence from the
logarithmic generalized Maxwell distribution is gained. The
expansion of the limit distribution from the normalized maxima is
established under the optimal norming constants, which shows the
rate of convergence of the distribution for normalized
maximum tending to the extreme limit.

**Category:** Statistics

[32] **viXra:1412.0276 [pdf]**
*replaced on 2016-06-13 09:23:33*

**Authors:** Jianwen Huang, Jianjun Wang

**Comments:** 18 Pages.

In this paper, with optimal normalized constants,
the asymptotic expansions of the distribution and density of the
normalized maxima from generalized Maxwell distribution are derived.
For the distributional expansion, it shows that the convergence rate
of the normalized maxima to the Gumbel extreme value distribution is
proportional to $1/\log n.$ For the density expansion, on the one
hand, the main result is applied to establish the convergence rate
of the density of extreme to its limit. On the other hand, the main
result is applied to obtain the asymptotic expansion of the moment
of maximum.

**Category:** Statistics

[31] **viXra:1409.0127 [pdf]**
*replaced on 2015-03-17 07:17:05*

**Authors:** Jianwen Huang, Shouquan Chen

**Comments:** 10 Pages.

Let $\{X_n,n\geq1\}$ be an independent and
identically distributed random sequence with common distribution $F$ obeying the lognormal distribution. In this paper, we obtain the exact uniform convergence rate of the distribution of maxima to its extreme value limit under power normalization.

**Category:** Statistics

[30] **viXra:1409.0051 [pdf]**
*replaced on 2016-05-27 09:29:29*

**Authors:** L. Martino, F. Leisen, J. Corander

**Comments:** 21 Pages.

Markov Chain Monte Carlo (MCMC) algorithms and Sequential Monte Carlo (SMC) methods (a.k.a., particle filters) are well-known Monte Carlo methodologies, widely used in different fields for Bayesian inference and stochastic optimization. The Multiple Try Metropolis (MTM) algorithm is an extension of the standard Metropolis- Hastings (MH) algorithm in which the next state of the chain is chosen among a set of candidates, according to certain weights. The Particle MH (PMH) algorithm is another advanced MCMC technique specifically designed for scenarios where the multidimensional target density can be easily factorized as multiplication of conditional densities. PMH combines jointly SMC and MCMC approaches. Both, MTM and PMH, have been widely studied and applied in literature. PMH variants have been often applied for the joint purpose of tracking dynamic variables and tuning constant parameters in a state space model. Furthermore, PMH can be also considered as an alternative particle smoothing method. In this work, we investigate connections, similarities and differences among MTM schemes and PMH methods. This study allows the design of novel efficient schemes for filtering and smoothing purposes in state space models. More specially, one of them, called Particle Multiple Try Metropolis (P-MTM), obtains very promising results in different numerical simulations.

**Category:** Statistics

[29] **viXra:1409.0051 [pdf]**
*replaced on 2016-05-25 09:33:48*

**Authors:** L. Martino, F. Leisen, J. Corander

**Comments:** 20 Pages.

Markov Chain Monte Carlo (MCMC) algorithms and Sequential Monte Carlo (SMC) methods (a.k.a., particle filters)
are well-known Monte Carlo methodologies, widely used in different fields for Bayesian inference and stochastic
optimization. The Multiple Try Metropolis (MTM) algorithm is an extension of the standard Metropolis-Hastings
(MH) algorithm in which the next state of the chain is chosen among a set of candidates, according to certain weights.
The Particle MH (PMH) algorithm is other advanced MCMC technique specifically designed for scenarios where the
multidimensional target density can be easily factorized as multiplication of conditional densities. PMH combines
SMC and MCMC approaches. Both, MTM and PMH, have been widely studied and applied in literature. PMH
variants have been often applied for the joint purpose of tracking dynamic variables and tuning constant parameters
in a state space model. Furthermore, PMH can be also considered as an alternative particle smoothing method. In
this work, we investigate similarities and differences among the MTM schemes and the PMH method. This study allows the design of novel efficient schemes for filtering and smoothing purposes for state space models. Specially one of them, called particle Multiple Try Metropolis (P-MTM), obtains very promising results in different numerical simulations.

**Category:** Statistics

[28] **viXra:1409.0051 [pdf]**
*replaced on 2016-03-17 14:39:23*

**Authors:** L. Martino, F. Leisen, J. Corander

**Comments:** 16 Pages.

Markov Chain Monte Carlo (MCMC) methods are well-known Monte Carlo methodologies, widely used in different fields for statistical inference and stochastic optimization. The Multiple Try Metropolis (MTM) algorithm is an extension of the standard Metropolis-Hastings (MH) algorithm in which the next state of the chain is chosen among a set of candidates, according to certain weights. The Particle MH (PMH) algorithm is other advanced MCMC technique specifically designed for scenarios where the multidimensional target density can be easily factorized as multiplication of (lower - dimensional) conditional densities. Both have been widely studied and applied in literature. In this note, we investigate similarities and differences among the MTM schemes and the PMH method. Furthermore, novel schemes are also designed.

**Category:** Statistics

[27] **viXra:1409.0015 [pdf]**
*replaced on 2014-12-15 15:30:35*

**Authors:** Ellida M. Khazen

**Comments:** Pages. The paper is being publuished in Cogent Mathematics (2016), 2:1134031. http://dx.doi.org/10.1080/23311835.2015.1134031

The problem of filtering of unobservable components x(t) of a multidimensional continuous diffusion Markov process z(t)=(x(t),y(t)), given the observations of the (multidimensional) process y(t) taken at discrete consecutive times with small time steps, is analytically investigated. On the base of that investigation the new algorithms for simulation of unobservable components, x(t), and the new algorithms of nonlinear filtering with the use of sequential Monte Carlo methods, or particle filters, are developed and suggested. The analytical investigation of observed quadratic variations is also developed. The new closed form analytical formulae are obtained, which characterize dispersions of deviations of the observed quadratic variations and the accuracy of some estimates for x(t). As an illustrative example, estimation of volatility (for the problems of financial mathematics) is considered. The obtained new algorithms extend the range of applications of sequential Monte Carlo methods, or particle filters, beyond the hidden Markov models and improve their performance.

**Category:** Statistics

[26] **viXra:1407.0133 [pdf]**
*replaced on 2017-12-30 09:54:44*

**Authors:** L. Martino, D. Luengo

**Comments:** 23 Pages. (to appear) Communications in Statistics - Simulation and Computation, 2018.

Multipath fading is one of the most common distortions in wireless communications. The simulation of a fading channel typically requires drawing samples from a Rayleigh, Rice or Nakagami distribution. The Nakagami-m distribution is particularly important due to its good agreement with empirical channel measurements, as well as its ability to generalize the well-known Rayleigh and Rice distributions. In this paper, a simple and extremely efficient rejection sampling (RS) algorithm for generating independent samples from a Nakagami-m distribution is proposed. This RS approach is based on a novel hat function composed of three pieces of well-known densities from which samples can be drawn easily and efficiently. The proposed method is valid for any combination of parameters of the Nakagami distribution, without any restriction in the domain and without requiring any adjustment from the final user. Simulations for several parameter combinations show that the proposed approach attains acceptance rates above 90% in all cases, outperforming all the RS techniques currently available in the literature.

**Category:** Statistics

[25] **viXra:1405.0280 [pdf]**
*replaced on 2015-03-25 13:29:09*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander

**Comments:** IEEE Transactions on Signal Processing, Volume 63, Issue 16, Pages 4422-4437, 2015

Monte Carlo (MC) methods are well-known computational techniques, widely used in different fields such as signal processing, communications and machine learning. An important class of MC methods is composed of importance sampling (IS) and its adaptive extensions, such as population Monte Carlo (PMC) and adaptive multiple IS (AMIS). In this work, we introduce a novel adaptive and iterated importance sampler using a population of proposal densities. The proposed algorithm, named adaptive population importance sampling (APIS), provides a global estimation of the variables of interest iteratively, making use of all the samples previously generated. APIS combines a sophisticated scheme to build the IS estimators (based on the deterministic mixture approach) with a simple temporal adaptation (based on epochs). In this way, APIS is able to keep all the advantages of both AMIS and PMC, while minimizing their drawbacks. Furthermore, APIS is easily parallelizable. The cloud of proposals is adapted in such a way that local features of the target density can be better taken into account compared to single global adaptation procedures. The result is a fast, simple, robust and high-performance algorithm applicable to a wide range of problems. Numerical results show the advantages of the proposed sampling scheme in four synthetic examples and a localization problem in a wireless sensor network.

**Category:** Statistics

[24] **viXra:1405.0280 [pdf]**
*replaced on 2014-07-04 10:52:29*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander

**Comments:** 19 Pages.

Monte Carlo (MC) methods are well-known computational techniques widely used in different fields such as signal processing, communications and machine learning.
An important class of MC methods is composed of importance sampling (IS) and its adaptive extensions, e.g., Adaptive Multiple IS (AMIS) and Population Monte Carlo (PMC).
In this work, we introduce a novel adaptive and iterated importance sampler using a population of proposal densities.
The proposed algorithm, named {\it Adaptive Population Importance Sampling} (APIS), provides a global estimation of the variables of interest iteratively, making use of all the samples previously generated.
APIS combines a sophisticated scheme to build the IS estimators (based on the deterministic mixture approach) with a simple temporal adaptation (based on epochs).
In this way, APIS is able to keep all the advantages of both AMIS and PMC while minimizing their drawbacks. Futhermore, the cloud of proposals is adapted in such a way that local features of the target density can be better taken into account compared to single global adaptation procedures.
The result is a fast, simple, robust and high-performance algorithm applicable to a wide range of problems. Numerical results show the advantages of the proposed sampling scheme for a toy example and a localization problem in a wireless sensor network.

**Category:** Statistics

[23] **viXra:1405.0280 [pdf]**
*replaced on 2014-05-23 12:13:47*

**Authors:** L. Martino, V. Elvira, D. Luengo, J. Corander

**Comments:** 20 Pages.

Monte Carlo (MC) methods are well-known computational techniques in different fields as signal processing, communications, and machine learning. An important class of MC methods is composed of importance sampling (IS) and its adaptive extensions, e.g., Adaptive Multiple IS (AMIS) and Population Monte Carlo (PMC). In this work, we introduce an adaptive and iterated importance sampler using a population of proposal densities. The novel algorithm, called {\it Adaptive Population Importance Sampling} (APIS), provides iteratively a global estimation of the variables of interest, using all the samples generated. APIS mixes together different convenient features of the AMIS and PMC schemes. Furthermore, APIS uses simultaneously simple and more sophisticated approaches (as the deterministic mixture) to build the IS estimators. The cloud of proposals is adapted by learning from a subset of previously generated samples, in such a way that local features of the target density can be better taken into account compared to single global adaptation procedures. Numerical results show the advantages of the proposed sampling scheme in terms of mean square error. The resulting algorithm is also more robust in terms of sensitivity to the initial choice of the parameters w.r.t. other techniques as AMIS and PMC.

**Category:** Statistics

[22] **viXra:1405.0263 [pdf]**
*replaced on 2015-04-09 13:23:39*

**Authors:** L. Martino, H. Yang, D. Luengo, J. Kanniainen, J. Corander

**Comments:** Digital Signal Processing, Volume 47, Pages 68-83, 2015.

Bayesian inference often requires efficient numerical approximation algorithms, such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) methods. The Gibbs sampler is a well-known MCMC technique, widely applied in many signal processing problems. Drawing samples from univariate full-conditional distributions efficiently is essential for the practical application of the Gibbs sampler. In this work, we present a simple, self-tuned and extremely efficient MCMC algorithm which produces virtually independent samples from these univariate target densities. The proposal density used is self-tuned and tailored to the specific target, but it is not adaptive. Instead, the proposal is adjusted during an initial optimization stage, following a simple and extremely effective procedure. Hence, we have named the newly proposed approach as FUSS (Fast Universal Self-tuned Sampler), as it can be used to sample from any bounded univariate distribution and also from any bounded multi-variate distribution, either directly or by embedding it within a Gibbs sampler. Numerical experiments, on several synthetic data sets (including a challenging parameter estimation problem in a chaotic system) and a high-dimensional financial signal processing problem, show its good performance in terms of speed and estimation accuracy.

**Category:** Statistics

[21] **viXra:1405.0263 [pdf]**
*replaced on 2014-07-02 10:33:21*

**Authors:** L. Martino, H. Yang, D. Luengo, J. Kanniainen, J. Corander

**Comments:** 18 Pages.

Bayesian inference often requires efficient numerical approximation algorithms such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) methods. The Gibbs sampler is a well-known MCMC technique widely applied in several fields (e.g., machine learning, finance, etc.). In the application of the Gibbs sampler one needs to efficiently generate values from univariate full-conditional distributions. In this work, we present a simple, self-tuned and extremely efficient MCMC algorithm which produces virtually independent samples from univariate target densities.
The proposal density used is self-tuned and tailored to the specific target, but it is not adaptive. Indeed, the proposal is adjusted during an initialization stage following a simple procedure. As a consequence, there is no ``fuss'' about convergence or tuning, and the execution of the algorithm is remarkably sped up. Although it can be used as a stand-alone algorithm to sample from a generic univariate distribution, the proposed approach is particularly suited for its use within a Gibbs sampler, especially when sampling from spiky multi-modal distributions. Hence, we call it FUSS (Fast Universal Self-tuned Sampler). Numerical experiments on several data sets show its good performance in terms of speed and estimation accuracy.

**Category:** Statistics

[20] **viXra:1405.0263 [pdf]**
*replaced on 2014-06-02 04:58:08*

**Authors:** L. Martino, H. Yang, D. Luengo, J. Kanniainen, J. Corander

**Comments:** 15 Pages.

Gibbs sampling is a well-known Markov Chain Monte Carlo (MCMC) technique, widely applied to draw samples from multivariate target distributions which appear often in many different fields (machine learning, finance, signal processing, etc.). The application of the Gibbs sampler requires being able to draw efficiently from the univariate full-conditional distributions. In this work, we present a simple, self-tuned and extremely efficient MCMC algorithm that produces virtually independent samples from the target. The proposal density used is self-tuned to the specific target but it is not adaptive. Instead, the proposal is adjusted during the initialization stage following a simple procedure. As a consequence, there is no ``fuss'' about convergence or tuning, and the execution of the algorithm is remarkably speed up. Although it can be used as a stand-alone algorithm to sample from a generic univariate distribution, the proposed approach is particularly suited for its use within a Gibbs sampler, especially when sampling from spiky multi-modal distributions. Hence, we call it FUSS (Fast Universal Self-tuned Sampler). Numerical experiments on several synthetic and real data sets show its good performance in terms of speed and estimation accuracy.

**Category:** Statistics