Elsevier

Signal Processing

Volume 85, Issue 5, May 2005, Pages 975-995
Signal Processing

A general approach for mutual information minimization and its application to blind source separation

https://doi.org/10.1016/j.sigpro.2004.11.021Get rights and content

Abstract

In this paper, a nonparametric “gradient” of the mutual information is first introduced. It is used for showing that mutual information has no local minima. Using the introduced “gradient”, two general gradient based approaches for minimizing mutual information in a parametric model are then presented. These approaches are quite general, and principally they can be used in any mutual information minimization problem. In blind source separation, these approaches provide powerful tools for separating any complicated (yet separable) mixing model. In this paper, they are used to develop algorithms for separating four separable mixing models: linear instantaneous, linear convolutive, post nonlinear (PNL) and convolutive post nonlinear (CPNL) mixtures.

Introduction

Let s1(t),s2(t),,sM(t) be M statistically independent source signals, from which only N different mixtures x1(t),x2(t),,xN(t) have been observed, i.e.,xi(t)=Fi(s1(t),,sN(t)),i=1,,N,where Fi denotes the unknown mixing system, which may be nonlinear and/or have memory. In vector form, the above equation is written as x(t)=F(s(t)), where s(t)(s1(t),,sM(t))T and x(t)(x1(t),,xM(t))T are source and observation vectors, respectively. Then, the goal of blind source separation (BSS) is to recover the original source signals by knowing only these observations. The problem is called Blind since there is no or very little information about the sources or about the mixing system. This problem has applications in different areas including feature extraction, brain imaging, telecommunications, speech enhancement, etc. [18]. Throughout this paper, it is always assumed that the number of observations is equal to the number of sources, that is, M=N.

Since the sole assumption is the independence of sources, the basic idea in blind source separation consists in estimating a system G, only from the observed data x(t), such that the components of y(t)=G(x(t)) are statistically independent. This method, based on statistical independence, constitutes a generic approach called independent component analysis (ICA). In general (nonlinear) case, it can be shown [19] that ICA does not lead to BSS. However, if some structural constraints are imposed on mixing and separating systems, the ICA approach may result in BSS, with eventually a few indeterminacies. For example, when both mixing and separating systems are assumed to be linear and memoryless (i.e., x(t)=As(t) and y(t)=Bx(t), where A and B are N×N regular matrices), then we have the well-known linear instantaneous mixtures, for which the equivalency of ICA and BSS has been already proved [12]: if the components of y are independent, and if there is at most one Gaussian source, then the outputs will be equal to the source signals up to a scale and a permutation indeterminacy.

Consequently, for using the ICA approach for source separation, we impose a structural constraint2 on mixing and separating systems. Then, the separating system is a parametric mappingy(t)=G(x(t);θ),where G is a “known” separating system with “unknown” parameters θ. It is also required that this mixing-separating model be “separable”, that is, the independence of the outputs (ICA) insures the separation of the sources (BSS). Under these conditions, the problem of source separation reduces to finding the parameter θ that maximizes the independence of the outputs.

To measure the degree of independence of the outputs, their mutual information may be used. The mutual information of the random variables y1,y2,,yN is defined as [13]I(y)=ypy(y)lnpy(y)ipyi(yi)dy,where y=(y1,,yN)T. This is nothing but the Kullback–Leibler divergence between py(y) and ipyi(yi). It is well known that I(y) is always nonnegative and vanishes if and only if the components of y are independent. Consequently, the solution of the BSS problem for model (2) is the vector θ that minimizes I(y). Mutual information has been already used in BSS by several authors (e.g., [1], [10], [12], [22]).

For minimizing I(y) with respect to θ, gradient based methods may be applied. To construct such a method, knowing the “differential” of the mutual information, that is, its variation resulting from a small deviation in its argument, is very useful. Such a nonparametric differential has been recently proposed [8] (see also Section 3).

The aim of this paper is to consider two general gradient based approaches (which are called “gradient” and “minimization–projection” approaches) for minimizing mutual information of y in a model like (2). These new approaches, and model (2), are quite general and can be used in any mutual information minimization problem. In BSS, they provide powerful tools for having a unifying approach for separating any complicated (yet separable) mixing model. As some examples, we use these approaches for separating four separable mixing models (see Section 4). Moreover, it is shown in this paper that mutual information has no “local minimum” (see Theorem 2 and Remark 2 after the theorem).

The paper is organized as follows. In Section 2, multi-variate score functions and score function difference (SFD) are introduced and their properties are discussed. Using SFD, the “gradient” of mutual information is presented in Section 3. Using properties of multi-variate score functions and SFD, we propose estimation algorithms for estimating multi variate score functions in Section 4. Two general approaches for minimizing mutual information are then introduced in Section 5. These approaches are used for blind separating four separable models, presented in Section 6: linear instantaneous mixtures in Section 7, linear convolutive mixtures in Section 8, post nonlinear (PNL) mixtures in Section 9 and convolutive post nonlinear (CPNL) mixtures in Section 10.

Section snippets

Multi-variate score functions

Multi-variate score functions have been first introduced in [3], by extending the concept of score function of a random variable to random vectors. The “gradient” of mutual information can be expressed in terms of these score functions (see Theorem 1). This section starts with a review of definitions, and continues by discussing their properties.

“Gradient” of mutual information

In this section, we use the multivariate score functions defined previously, for deriving a “gradient” for mutual information. The variation of mutual information resulting from a small deviation in its argument (it may be called “differential” of mutual information) is given by the following theorem [8].

Theorem 1 Differential of mutual information

Let x be a bounded random vector, and let Δ be asmallrandom vector with the same dimension, thenI(x+Δ)-I(x)=E{ΔTβx(x)}+o(Δ),where βx is the SFD of x, and o(Δ) denotes higher order terms in

Estimating multi-variate score functions

For using SFD in minimizing mutual information, it must be first estimated from the observed data. The MSF of a random vector is nothing but the score functions of its components, and hence it can be estimated by any estimation method of a score function (see for example [14], [28], [30]). In this section, the estimation of JSF and SFD is considered.

Two general approaches for mutual information minimization

In this section, we propose two gradient-based approaches for minimizing I(y) with the model of Eq. (2). These approaches are both based on Theorem 1 and SFD as a non-parametric gradient for mutual information.

Some separable models for blind source separation

In the following sections, we will use the presented gradient and MP approaches for blind source separation. In BSS, a parametric model for the separating system is required, because general nonlinear mixtures are not separable [19] (i.e., the output independence does not insure source separation). Four separable models, for which, separation algorithms will be addressed in the following sections are:

(1) Linear instantaneous mixtures: This is the simplest case, in which the mixing system is x=As

Application to linear instantaneous mixtures

Here, as an illustration of the “gradient” and “MP” approaches, we utilize them for separating linear instantaneous mixtures. All of the algorithms and programs of this section are available as a MATLAB package with a graphical interface at

http://www.lis.inpg.fr/pages_perso/bliss/deliverables/d20.html

Application to convolutive mixtures

In this section, we show how the gradient and the MP approaches can be used in separating convolutive mixtures of two sources.

Application to PNL mixtures

In this section, we consider the application of the general mutual information minimization approaches of Section 5 (gradient and MP approaches) in separating PNL mixtures (Fig. 2).

Application to CPNL mixtures

We terminate the applications of gradient and MP approaches by using them in separating CPNL mixtures of two sources. In CPNL mixtures, the separation system composed of the linear FIR filter (48) and nonlinear functions gi. Consequently, the separation algorithms can be obtained by combining the algorithms obtained for convolutive and PNL mixtures.

Conclusion

In this paper, we introduced the concept of SFD, which can be considered as a nonparametric ‘gradient’ of the mutual information. We also showed that SFD is a powerful tool for minimizing mutual information of a parametric model, and we proposed two general approaches called gradient and MP approaches. These approaches give a unifying view and powerful tools for separating any complicated (yet separable) mixing model (for example, up to our best knowledge there is currently no other method for

References (33)

  • P. Comon

    Independent component analysis, a new concept?

    Signal Processing

    (1994)
  • C. Jutten et al.

    Three easy ways for separating nonlinear mixtures?

    Signal Process.

    (February 2004)
  • J. Sole et al.

    Parametric approach to blind deconvolution of nonlinear channels

    Neurocomputing

    (2002)
  • L.B. Almeida

    MISEP—linear and nonlinear ica based on mutual information

    J. Mach. Learning Res.

    (December 2003)
  • S.I. Amari

    Natural gradient works efficiently in learning

    Neural Comput.

    (1998)
  • M. Babaie-Zadeh et al.

    Separating convolutive mixtures by mutual information minimization

  • M. Babaie-Zadeh et al.

    Blind separating convolutive post-nonlinear mixtures

  • M. Babaie-Zadeh, C. Jutten, K. Nayebi, A geometric approach for separating Post Non-Linear mixtures, in: EUSIPCO,...
  • M. Babaie-Zadeh et al.

    Using multivariate score functions in source separationapplication to post non-linear mixtures

    Scientia-Iranica

    (2002)
  • M. Babaie-Zadeh et al.

    Minimization-projection (MP) approach for blind source separation in different mixing models

  • M. Babaie-Zadeh et al.

    Differential of mutual information

    IEEE Signal Process. Lett.

    (January 2004)
  • M. Babaie-Zadeh, C. Jutten, K. Nayebi, A minimization-projection (MP) approach for blind separating convolutive...
  • J.-F. Cardoso

    Blind signal separationstatistical principles

    Proc. IEEE

    (1998)
  • J.-F. Cardoso et al.

    Equivariant adaptive source separation

    IEEE Trans. SP

    (December 1996)
  • T.M. Cover et al.

    Elements of Information Theory

    (1991)
  • D.D. Cox

    A penalty method for nonparametric estimation of the logarithmic derivative of a density function

    Ann. Instit. Statist. Math.

    (1985)
  • Cited by (61)

    • Robust blind separation of smooth graph signals using minimization of graph regularized mutual information

      2022, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      Despite the significant advantages and contributions of these criteria, they estimate the independence between sources, which different limitations, e.g., lack of the available data, can severely affect the independence estimation quality [2]. Mutual Information (MI) is one of the other main information-theoretic metrics that exactly implements the statistical independence [8]. It has been shown that MI has no local minimum [8], and its stochastic gradient is based on the Score Difference Function (SFD) of the estimated latent sources [9].

    • De-combination of belief function based on optimization

      2022, Chinese Journal of Aeronautics
      Citation Excerpt :

      The known combined BBA can be regarded as the observed mixed signal in BSS, while the original BBAs to determine are regarded as the original signals without observations in BSS. In BSS, Minimum Mutual Information (MMI)29 can implement blind signal separation based on the criterion of the maximization of difference between information sources. The distance of evidence represents the degree of dissimilarity between BBAs in the theory of belief functions.

    • Mutual information rate of nonstationary statistical signals

      2020, Signal Processing
      Citation Excerpt :

      In this part, the relationship of the entropy rate after passing through a fractional filter (linear time-variant filter) is deduced. The deconvolution skills introduced in [1,10] are based on the assumption that the source signal is stationary. However, this stationary assumption on the signal are impractical in most applications.

    • A new convolutive source separation approach for independent/dependent source components

      2020, Digital Signal Processing: A Review Journal
      Citation Excerpt :

      In section 4, we describe the proposed approach. Section 5 presents some numerical results illustrating the proposed method, giving comparisons with the independent/dependent source separation method of [34], and with the independent source separation methods of [8] and [9]. Notice that the last criterion (27) extends criterion (25), which extends criterion (20), which in turn generalizes the independent criterion (15).

    • Gradients of the fundamental information measures: Theory and applications

      2019, Signal Processing
      Citation Excerpt :

      Utilizing the functional representation of the systems, Sedighizad and Seyfe [16] interprets the critical conditions on some of the main results of [5] as to be valid only in additive noise systems. In [17–19], variation of the differential entropy and mutual information due to the variation of their arguments were studied, and some of the related applications in the signal processing problems were explored. In this paper, relying upon the notions introduced in these references and taking a functional approach in the system representation, we study the gradients of differential entropy and mutual information in some general cases.

    View all citing articles on Scopus

    This work has been partially funded by the European project BLISS (IST-1999-14190), by Sharif University of Technology, and by Iran Research Telecom Center (ITRC).

    1

    Ch. Jutten is professor at Université Joseph Fourier.

    View full text