A general approach for mutual information minimization and its application to blind source separation☆
Introduction
Let be M statistically independent source signals, from which only N different mixtures have been observed, i.e.,where denotes the unknown mixing system, which may be nonlinear and/or have memory. In vector form, the above equation is written as , where and are source and observation vectors, respectively. Then, the goal of blind source separation (BSS) is to recover the original source signals by knowing only these observations. The problem is called Blind since there is no or very little information about the sources or about the mixing system. This problem has applications in different areas including feature extraction, brain imaging, telecommunications, speech enhancement, etc. [18]. Throughout this paper, it is always assumed that the number of observations is equal to the number of sources, that is, .
Since the sole assumption is the independence of sources, the basic idea in blind source separation consists in estimating a system , only from the observed data , such that the components of are statistically independent. This method, based on statistical independence, constitutes a generic approach called independent component analysis (ICA). In general (nonlinear) case, it can be shown [19] that ICA does not lead to BSS. However, if some structural constraints are imposed on mixing and separating systems, the ICA approach may result in BSS, with eventually a few indeterminacies. For example, when both mixing and separating systems are assumed to be linear and memoryless (i.e., and , where and are regular matrices), then we have the well-known linear instantaneous mixtures, for which the equivalency of ICA and BSS has been already proved [12]: if the components of are independent, and if there is at most one Gaussian source, then the outputs will be equal to the source signals up to a scale and a permutation indeterminacy.
Consequently, for using the ICA approach for source separation, we impose a structural constraint2 on mixing and separating systems. Then, the separating system is a parametric mappingwhere is a “known” separating system with “unknown” parameters . It is also required that this mixing-separating model be “separable”, that is, the independence of the outputs (ICA) insures the separation of the sources (BSS). Under these conditions, the problem of source separation reduces to finding the parameter that maximizes the independence of the outputs.
To measure the degree of independence of the outputs, their mutual information may be used. The mutual information of the random variables is defined as [13]where . This is nothing but the Kullback–Leibler divergence between and . It is well known that is always nonnegative and vanishes if and only if the components of are independent. Consequently, the solution of the BSS problem for model (2) is the vector that minimizes . Mutual information has been already used in BSS by several authors (e.g., [1], [10], [12], [22]).
For minimizing with respect to , gradient based methods may be applied. To construct such a method, knowing the “differential” of the mutual information, that is, its variation resulting from a small deviation in its argument, is very useful. Such a nonparametric differential has been recently proposed [8] (see also Section 3).
The aim of this paper is to consider two general gradient based approaches (which are called “gradient” and “minimization–projection” approaches) for minimizing mutual information of in a model like (2). These new approaches, and model (2), are quite general and can be used in any mutual information minimization problem. In BSS, they provide powerful tools for having a unifying approach for separating any complicated (yet separable) mixing model. As some examples, we use these approaches for separating four separable mixing models (see Section 4). Moreover, it is shown in this paper that mutual information has no “local minimum” (see Theorem 2 and Remark 2 after the theorem).
The paper is organized as follows. In Section 2, multi-variate score functions and score function difference (SFD) are introduced and their properties are discussed. Using SFD, the “gradient” of mutual information is presented in Section 3. Using properties of multi-variate score functions and SFD, we propose estimation algorithms for estimating multi variate score functions in Section 4. Two general approaches for minimizing mutual information are then introduced in Section 5. These approaches are used for blind separating four separable models, presented in Section 6: linear instantaneous mixtures in Section 7, linear convolutive mixtures in Section 8, post nonlinear (PNL) mixtures in Section 9 and convolutive post nonlinear (CPNL) mixtures in Section 10.
Section snippets
Multi-variate score functions
Multi-variate score functions have been first introduced in [3], by extending the concept of score function of a random variable to random vectors. The “gradient” of mutual information can be expressed in terms of these score functions (see Theorem 1). This section starts with a review of definitions, and continues by discussing their properties.
“Gradient” of mutual information
In this section, we use the multivariate score functions defined previously, for deriving a “gradient” for mutual information. The variation of mutual information resulting from a small deviation in its argument (it may be called “differential” of mutual information) is given by the following theorem [8]. Theorem 1 Differential of mutual information Let be a bounded random vector, and let be a ‘small’ random vector with the same dimension, thenwhere is the SFD of , and denotes higher order terms in
Estimating multi-variate score functions
For using SFD in minimizing mutual information, it must be first estimated from the observed data. The MSF of a random vector is nothing but the score functions of its components, and hence it can be estimated by any estimation method of a score function (see for example [14], [28], [30]). In this section, the estimation of JSF and SFD is considered.
Two general approaches for mutual information minimization
In this section, we propose two gradient-based approaches for minimizing with the model of Eq. (2). These approaches are both based on Theorem 1 and SFD as a non-parametric gradient for mutual information.
Some separable models for blind source separation
In the following sections, we will use the presented gradient and MP approaches for blind source separation. In BSS, a parametric model for the separating system is required, because general nonlinear mixtures are not separable [19] (i.e., the output independence does not insure source separation). Four separable models, for which, separation algorithms will be addressed in the following sections are:
(1) Linear instantaneous mixtures: This is the simplest case, in which the mixing system is
Application to linear instantaneous mixtures
Here, as an illustration of the “gradient” and “MP” approaches, we utilize them for separating linear instantaneous mixtures. All of the algorithms and programs of this section are available as a MATLAB package with a graphical interface at
http://www.lis.inpg.fr/pages_perso/bliss/deliverables/d20.html
Application to convolutive mixtures
In this section, we show how the gradient and the MP approaches can be used in separating convolutive mixtures of two sources.
Application to PNL mixtures
In this section, we consider the application of the general mutual information minimization approaches of Section 5 (gradient and MP approaches) in separating PNL mixtures (Fig. 2).
Application to CPNL mixtures
We terminate the applications of gradient and MP approaches by using them in separating CPNL mixtures of two sources. In CPNL mixtures, the separation system composed of the linear FIR filter (48) and nonlinear functions . Consequently, the separation algorithms can be obtained by combining the algorithms obtained for convolutive and PNL mixtures.
Conclusion
In this paper, we introduced the concept of SFD, which can be considered as a nonparametric ‘gradient’ of the mutual information. We also showed that SFD is a powerful tool for minimizing mutual information of a parametric model, and we proposed two general approaches called gradient and MP approaches. These approaches give a unifying view and powerful tools for separating any complicated (yet separable) mixing model (for example, up to our best knowledge there is currently no other method for
References (33)
Independent component analysis, a new concept?
Signal Processing
(1994)- et al.
Three easy ways for separating nonlinear mixtures?
Signal Process.
(February 2004) - et al.
Parametric approach to blind deconvolution of nonlinear channels
Neurocomputing
(2002) MISEP—linear and nonlinear ica based on mutual information
J. Mach. Learning Res.
(December 2003)Natural gradient works efficiently in learning
Neural Comput.
(1998)- et al.
Separating convolutive mixtures by mutual information minimization
- et al.
Blind separating convolutive post-nonlinear mixtures
- M. Babaie-Zadeh, C. Jutten, K. Nayebi, A geometric approach for separating Post Non-Linear mixtures, in: EUSIPCO,...
- et al.
Using multivariate score functions in source separationapplication to post non-linear mixtures
Scientia-Iranica
(2002) - et al.
Minimization-projection (MP) approach for blind source separation in different mixing models
Differential of mutual information
IEEE Signal Process. Lett.
Blind signal separationstatistical principles
Proc. IEEE
Equivariant adaptive source separation
IEEE Trans. SP
Elements of Information Theory
A penalty method for nonparametric estimation of the logarithmic derivative of a density function
Ann. Instit. Statist. Math.
Cited by (61)
Robust blind separation of smooth graph signals using minimization of graph regularized mutual information
2022, Digital Signal Processing: A Review JournalCitation Excerpt :Despite the significant advantages and contributions of these criteria, they estimate the independence between sources, which different limitations, e.g., lack of the available data, can severely affect the independence estimation quality [2]. Mutual Information (MI) is one of the other main information-theoretic metrics that exactly implements the statistical independence [8]. It has been shown that MI has no local minimum [8], and its stochastic gradient is based on the Score Difference Function (SFD) of the estimated latent sources [9].
De-combination of belief function based on optimization
2022, Chinese Journal of AeronauticsCitation Excerpt :The known combined BBA can be regarded as the observed mixed signal in BSS, while the original BBAs to determine are regarded as the original signals without observations in BSS. In BSS, Minimum Mutual Information (MMI)29 can implement blind signal separation based on the criterion of the maximization of difference between information sources. The distance of evidence represents the degree of dissimilarity between BBAs in the theory of belief functions.
Mutual information rate of nonstationary statistical signals
2020, Signal ProcessingCitation Excerpt :In this part, the relationship of the entropy rate after passing through a fractional filter (linear time-variant filter) is deduced. The deconvolution skills introduced in [1,10] are based on the assumption that the source signal is stationary. However, this stationary assumption on the signal are impractical in most applications.
A new convolutive source separation approach for independent/dependent source components
2020, Digital Signal Processing: A Review JournalCitation Excerpt :In section 4, we describe the proposed approach. Section 5 presents some numerical results illustrating the proposed method, giving comparisons with the independent/dependent source separation method of [34], and with the independent source separation methods of [8] and [9]. Notice that the last criterion (27) extends criterion (25), which extends criterion (20), which in turn generalizes the independent criterion (15).
Gradients of the fundamental information measures: Theory and applications
2019, Signal ProcessingCitation Excerpt :Utilizing the functional representation of the systems, Sedighizad and Seyfe [16] interprets the critical conditions on some of the main results of [5] as to be valid only in additive noise systems. In [17–19], variation of the differential entropy and mutual information due to the variation of their arguments were studied, and some of the related applications in the signal processing problems were explored. In this paper, relying upon the notions introduced in these references and taking a functional approach in the system representation, we study the gradients of differential entropy and mutual information in some general cases.
- ☆
This work has been partially funded by the European project BLISS (IST-1999-14190), by Sharif University of Technology, and by Iran Research Telecom Center (ITRC).
- 1
Ch. Jutten is professor at Université Joseph Fourier.