[Gta04-owner] Speex echo cancelation now working?

NeilBrown neilb at suse.de
Mon Apr 16 14:19:23 CEST 2012


On Mon, 16 Apr 2012 10:29:02 +0200 Radek Polak <psonek2 at seznam.cz> wrote:

> On Monday, April 16, 2012 02:59:31 AM NeilBrown wrote:
> 
> > Hi Radek,
> > 
> >  I finally got up to the stage of making real phone calls on my GTA04 and
> >  this was very helpful!  Thanks.
> 
> Nice to hear that it works for you.
> 
> >  I've been examining it to make sure I understand what is happening and
> > there are a few peculiarities.  The thing that stood out for me was the
> > apparent need to set start_threshold so high.  I would have thought we want
> > to start playing samples as soon as possible, but the setting you use
> > doesn't start playing until the buffer is full.
> >  So I tried reducing it can got terrible clicks and over-runs (as I'm sure
> >  you know).
> 
> Yup, it took me most of the time to figure out this. My first tries were without 
> the threshold. This worked on my notebook but not on GTA04. Then i looked at 
> aplay sources and found this threshold param and it started working (btw they 
> set it in aplay too).
> 
> >  Continuing exploration found two more interesting things.
> > 
> >  1/ At the point where you do echo cancellation, the two input buffers are
> >     different ages.  One was captured just recently (over the last 32ms)
> >     while the other was captured before that (between 64 and 32 ms ago).
> >     You can show this by calling snd_pcm_delay() on each handle (r0.handle
> >     and r1.handle).  I found that r0.handle was consistently 256 samples old
> > while r0.handle was fresh.
> > 
> >     The sound devices don't actually start capturing until the first read()
> >     call (or a call to snd_pcm_start()).
> >     Your code repeatedly reads from the GSM source until it gets a
> > successful read, then it reads from the microphone.  So we don't start
> > recording from the microphone until we already have a 32ms buffer (256
> > samples) from the GSM source.  This means we are always 32ms out of sync.
> > 
> >     This can easily be addressed by inserting:
> > 
> >      while (route_stream_read(&r1))
> >             ;
> > 
> >     before starting the "while (!terminating) {" loop.
> 
> I though i am reading from both of the cards in the beginning:
> 
>     while (!terminating) {
>         if (route_stream_read(&r0)) {	<==== internal

This read request (the first time through the loop) tells the sound device to
start recording. It will collect samples for 32ms and then return the buffer 
with 256 frames it in.

>             blink_aux();
>             continue;
>         }
> 
>         rc = route_stream_read(&r1);	<==== umts

Now we tell the other sound device to start recording (When the first one has
already been recording for 32ms).   It will collect samples for 32ms and then
return a buffer with 256 frames in it.
During this time the internal sound device was still recording and has
another 32ms of sound buffered.  But we don't look at that yet.

So the time-of-recording of the two buffers we have at this point differ by
32ms.

> 
> 
> >     Doing this discards the first full period received from the GSM
> >     source, but allows the two streams to be more in-sync: The
> >     snd_pcm_delay is the same for both. This might allow us to reduce the
> >     size of the 'tail' given to speex_echo_state_init() which is higher than
> > it should need to be.
> 
> Yes, the current tail 8192 looks too high to me. But if i used smaller values 
> the other side started to hear some artifacts and with small values even the 
> full echo.
> 
> As far as i understand it the tail should be very small - we want to process 
> sound played on earpiece and remove it from sound recorded by internal sound 
> card. Shouldnt here be 1 or 2 periods enough?

The length of the tail is a function of the total delay which has 2 parts:
 1- the difference between the time when the original period was played and
    the time when the received period was recorded
 2- the time for sound to travel from the speaker to the microphone.

When using the handset speaker-mic there should be no echo in the room (off
walls) so the only sound-path would be directly from speaker to might which
would be very short (less than 1 msec). So if we can make sure that the
"echo frame" passed to speex_echo_cancellation was played at exactly the same
time that the "input_frame" was recorded, then a very short tail (1 or 2
periods) should suffice.
However I think there is currently a big difference (500ms?) between when the
one frame was played and when the other frame was recorded.



> 
> >     It seems that reading from the 'microphone' device sometimes takes well
> >     over 50ms which is much too long considering that each period is only
> >     32ms long.
> >     One read will take 55msec, the next (starting another 8 msec later due
> > to the other processing that happens) takes less than one msec.
> > 
> >     So it seems that the sound device is waiting until two periods have been
> > recorded before returning anything.  Then it returns the first and the
> > second is immediately available.
> > 
> >     So either we have something wrong in the configuration, or there is a
> > bug somewhere.
> 
> This could be reason why the sound routing program does not work in SHR. They 
> are using dmix  by default in /etc/asound.conf and they reported that the 
> routing program does not work for them.

I considered playing with dmix briefly but I think it would add too many
problems.  It adds another level of indirection which makes it that much
harder to get the timing right.

I suspect you might be able to make it work if you set the period_size and
the buffer_size of the dmix slave device to match what the upper levels are
using.
See http://www.alsa-project.org/main/index.php/Asoundrc#dmix for how you set
period_size and buffer_size for dmix devices.

But I'd want to make it work well on raw devices before adding any plugins
into the mix.


> 
> Also i couldnt make the sound routing working with pulseaudio, which was 
> working good on PC.
> 
> > I probably won't have time to play with this for a few days, so I thought
> > I'd explain where I was up to in the hope that someone else might like to
> > try experimenting.
> 
> I wont have much time for it too.
> 
> > My main goal is to be able to increase the volume on calls.  If I try
> > that, I start getting really bad echo. 
> 
> You or the other side? The echo cancellation is performed only for the other 
> side. The other side should perform the echo cancellation for you.

Yes, the other side.  I don't think I get much echo - certainly not enough to
worry me.  But I don't want to impose echo on the person I'm talking to.

> 
> > I'm hoping that if we can sort
> > out the timing issues so that there is less delay between record and
> > play, then the echo cancellation might be able to do a better job.
> 
> Yup, that would be nice. For me it currently works very good, but any 
> improvements will be nice.
> 
> Btw we had talk with Joerg on IRC about the voice routing program. There is 
> interesting question about the different clocks in internal and umts sound 
> card. They should be synced, but somehow it works now even without syncing - 
> or doesnt it?

It seems to...

I guess the actual difference between the clocks is small enough that we
don't notice.

Each clock is responsible for 8000 samples per second.
If one of them only does 7999, then after 256 seconds (4 minutes) you would
expect to lose a frame resulting in a bad click.

I found a quote at http://www.seventhstring.com/tuningfork/tuningfork.html
which says "A quick look around the web shows that sound card clock accuracy
of 50ppm (parts per million) is considered pretty good"

Assuming this is true, then that is 1 in 20,000 or 1 32ms click in 640 seconds
(10 minutes).

The variance may be much less than 1:20000, though it could be a bit more.
So it seems reasonable to expect a click every 5-15 minutes or so.
Have you had a long enough conversation to see it that might be the case?

Once we get the other timing issues clarified, it probably won't be very hard
to measure the clock drift and insert/delete samples to keep them matched -
if it turns out to be a problem.


NeilBrown
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 828 bytes
Desc: not available
URL: <http://lists.goldelico.com/pipermail/gta04-owner/attachments/20120416/3356d289/attachment.bin>


More information about the Gta04-owner mailing list