In order to understand commands given through voice by an operator, user or any human, a robot needs to focus on a single source, to acquire a clear speech sample and to recognize it. A two-step approach to the deconvolution of speech and sound mixtures in the time-domain is proposed. At first, we apply a deconvolution procedure, constrained in the sense, that the de-mixing matrix has fixed diagonal values without non-zero delay parameters. We derive an adaptive rule for the modification of the de-convolution matrix. Hence, the individual outputs extracted in the first step are eventually still self-convolved. This corruption we try to eliminate by a de-correlation process independently for every individual output channel.
Independent Component Analysis (ICA) can be used for single channel audio separation, if a mixed signal is transformed into time-frequency domain and the resulting matrix of magnitude coefficients is processed by ICA. Previous works used only frequency (spectral) vectors and Kullback-Leibler distance measure for this task. New decomposition bases are proposed: time vectors and time-frequency components. The applicability of several different measures of distance of components are analysed. An algorithm for clustering of components is presented. It was tested on mixes of two and three sounds. The perceptual quality of separation obtained with the measures of distance proposed was evaluated by listening tests, indicating "beta" and "correlation" measures as the most appropriate. The "Euclidean" distance is shown to be appropriate for sounds with varying amplitudes. The perceptual effect of the amount of variance used was also evaluated.