Está en la página 1de 10

SE-402 RESEARCH REPORT 1

Experiments in Single-Sensor Acoustic


Localization
Seth Pollen and Nathan Radtke
pollens@msoe.edu, radtken@msoe.edu

Abstract—This research project explores the area of tangible acoustic the rest of signal processing to take place in software.
interfaces as an alternative to traditional touch screens. Past research Finally, we want to improve usability of the system when
has shown the feasibility of single sensor acoustic localization. The a projector display is overlaid onto the touch surface,
weakness of existing solutions is the high effort required for calibration.
allowing calibration to take place interactively.
In this paper, we present our attempts to create a low cost, single sensor
acoustic localization solution, which employs algorithms intended to
reduce the user effort in calibration. We provide the results of our
implementation and present future research opportunities for the area
2 P REVIOUS R ESEARCH
of tangible acoustic interfaces. Significant research into tangible acoustic interfaces
took place in 2005-6 as a part of the TAI-CHI project
Index Terms—Tangible acoustic interfaces, touch screens, acoustics, (www.taichi.cf.ac.uk). This project identified three major
neural networks, signal processing
algorithmic approaches to the design of TAIs: time de-
lay of arrival (TDOA), time reversal (TR), and acoustic
1 P ROBLEM S TATEMENT holography [2]. Of these three, the time reversal ap-
proach has the special advantage of operating with as
Touch interfaces to computer devices have been the focus
little as one acoustic sensor.
of significant innovation in recent years, but mainstream
A single-sensor solution has many advantages. The
touch screen technologies remain expensive and do not
sensor can be positioned almost anywhere on the physi-
scale easily to large sizes or complex shapes. Thus, while
cal object being touched, eliminating the potential for hu-
touch surfaces could find numerous applications in the
man error in positioning multiple sensors relative to one
context of everyday human life, they are, for the most
another. Furthermore, a single sensor permits applica-
part, currently limited to traditional computing devices
tions to use existing sound input hardware (available on
such as cell phones and PCs.
most personal computers) to receive the incoming signal,
One technology which promises to make touch sur-
allowing all signal processing to be done in software [3].
faces more ubiquitous and economical relies on acoustic
The time reversal technique relies on the assumption that
sensors to detect impacts (such as taps from a fingertip
impacts at different locations on the sensitive object will
or solid object) on a surface by sensing the vibrations
produce unique acoustic patterns in the object (through
caused by those impacts. This technology, known as tan-
reverberations off the object’s boundaries as well as dif-
gible acoustic interfaces (TAI), is much simpler, cheaper,
ferent wave propagation modes), which can be received
and more power-efficient than existing technologies, and
by a single sensor and distinguished from one another
it scales very well to large and complex (non-planar)
in software. Calibration of a time-reversal system, which
surfaces. This opens up the possibility of making nearly
involves tapping various points on the surface to provide
any solid object into a responsive touch interface [1].
an initial data set to the time reversal algorithm, allows
The goal of our project is, first, to explore TAI solu-
the system to store the particular acoustic characteristics
tions using a single, inexpensive sensor and, second, to
of the medium and sensor placement.
reduce the calibration effort required for such systems
However, the research done on the TAI-CHI project
without reducing the system’s touch input resolution.
left some problems unsolved. Most importantly, their
This improvement will require the implementation of
time-reversal solutions are entirely discrete. Extant litera-
some kind of interpolation which allows the system to
ture does not seem to provide methods for interpolating
more precisely localize taps not coincident with given
between calibration points, meaning a large number of
calibration points. To reduce the cost of our solution,
calibration points is required to achieve a resolution that
we also hope to be able to use an ordinary laptop
appears continuous to users. The literature also seems to
audio line-in to digitize the incoming signal, allowing
lack significant inquiry into methods for distinguishing
Seth Pollen and Nathan Radtke are undergraduate software engineering valid impacts on the sensitive area from extraneous noise
students at the Milwaukee School of Engineering in Milwaukee, Wisconsin. or impacts to other areas on the object not intended
2 SE-402 RESEARCH REPORT

to be sensitive. In this paper, we discuss our efforts to following set of values for these parameters were found
integrate interpolation and noise exclusion into existing to be the most adaptable and robust:
single-sensor acoustic localization techniques.
h = 0.015
3 F OUR -P OINT A LGORITHM L = 25
The first software technique we investigated deviates
T = 575
from existing single-sensor algorithms by making sig-
nificant assumptions about the acoustic medium. Based P = 3000
on those assumptions, it attempts to model the acoustic With these settings, the software is able to correctly
transfer function across the whole medium. This algo- identify taps on a variety of surfaces (chalkboards and
rithm is named “four-point” because of our initial hopes tabletops) and with a variety of tap devices (rubberized
that it would achieve acceptable localization across the bolts, human knuckles, and chalk). Note that these set-
entire sensitive area based only on calibration data from tings are dependent on the capabilities of our sound
the four corners of that area. While we did not find card; they are designed for operation with an amplitude
success with this algorithm, its outcomes informed our range of ±1.0 and a sample rate of 44.1 kHz. These
further research and are discussed here. settings for T and L mean that each window of data
prepared for further processing is 600 samples (13.6 ms)
3.1 Assumptions and Theory in length, and the minimum spacing between impacts is
For this algorithm, the sensitive object, or medium, 3,025 samples (68.6 ms).
is assumed to be an ideal acoustic conductor with a
rectangular shape and smooth, straight edges. Under 3.3 Quantization
these assumptions, a burst of sound produced by an
impact propagates through the medium without being The next step in the four-point algorithm is the quan-
altered and reverberates off the edges of the medium tizer, which converts incoming sound waves into a
until being fully attenuated. A sensor attached to the discrete time series of echoes. Several signal-processing
medium receives, in addition to the original sound pulse, approaches to quantization were considered, includ-
a series of echoes of that pulse, corresponding to all the ing conversion to the frequency domain. We rejected
linear reflected paths between the impact point and the frequency-based techniques, however, because the over-
sensor. Part of the four-point algorithm reduces each laying of echoes into a single signal affects the amplitude
received tap to a series of timestamps describing the of the signal far more than its frequency.
delays between successive echoes. Assuming a constant While several variations were tried, all of our ap-
speed of sound in the medium, these timestamps can proaches to the quantizer consisted roughly of two
be used to reason about the lengths of the linear paths steps, illustrated in Figure 1. First, to smooth out the
followed by the sound waves, and from that reasoning oscillations in the incoming sound wave, the signal is
the location of the impact may be derived. squared and filtered using a low-pass FIR filter with
In addition to assumptions about the acoustic prop- a cutoff around 640 Hz, and the square root is then
erties of the medium, this approach also assumes the taken to restore the signal to its original scale. Note
impact itself to consist of a brief burst of sound followed that the frequency envelope of a normal chalkboard tap
by silence. The duration of this initial burst must be peaks around 3 kHz. In the figure, the red line shows
shorter than the time delay of echoes in the medium the original sound data captured by our microphone,
so that echoes do not overlap and obscure one another and the blue line shows the curve produced by this
in the received signal. smoothing technique. This smoothed curve provides a
continuous approximation of the amplitude of the sound
wave. In the second step of quantization, a median filter
3.2 Tap Identification is used to detect peaks in the smoothed curve. When it
A threshold-detection algorithm is used to identify im- encounters a peak of the correct width, the median filter
pacts in the signal coming in through the sound card. outputs a flat plateau which can easily be identified by
Each time the audio signal crosses a certain threshold examining the differences between successive samples.
h, a snapshot of the signal is taken with a number of The positions of these detected peaks (illustrated as black
samples L prior to the threshold-crossing sample and T boxes in the figure) yield the timestamp series of echoes.
samples following it; this snapshot is then sent to the It is possible, however, for this algorithm to detect
next stage of processing. After one snapshot is created, waveform features not actually caused by echoes in the
new snapshots are not returned until a certain interval acoustic medium. The quantizer detects any fluctuation
of P samples passes without any of them crossing the of the correct duration in the smoothed curve, no matter
threshold. This ensures that an impact followed by a how slight the amplitude. Therefore, another algorithm
particularly long string of echoes is not interpreted as step was added: it filters out all detected peaks except for
two separate impacts. Based on our experiments, the a constant number of peaks having the highest absolute
MAY 27, 2011 3

Fig. 1. Steps of echo quantization. medium boundaries. If one such calibration tap is found,
the appropriate proposed boundary for the medium is
extended by a small amount, and the same process is run
again. Hopefully, this method of successive approxima-
tions eventually converges on the correct medium size
and terminates.
Once calibration has determined the physical extent of
the medium, a simple algorithm can use this information
to localize incoming taps by simulating the propagation
of an impact at various points in the medium and
selecting the point whose simulation results most closely
match the observed timestamp series.

3.5 Problems with Four-Point


Although significant effort was spent tuning the four-
point algorithm, it was eventually deemed unworkable.
The biggest problem was the inconsistency of the quan-
tizer. Running our quantizer on data recorded from
different taps at the same point on the surface produced
significantly different timestamp series. This instability
in quantization was probably a result of our oversimplifi-
cation of the acoustic problem. It is quite possible that the
sound wave received by our sensor is the combination of
several different wave propagation modes which differ
from our assumed model of simple reflections [4]. In any
case, unstable quantization proved disastrous for our
amplitudes; it was assumed that higher-amplitude peaks geometric calibration technique. Because our calibration
were more likely to represent actual echoes. A further technique regards each calibration sample (that is, each
refinement to this was also tried, which multiplies each series of timestamps) individually when negotiating the
peak’s amplitude by a constant power (usually less physical dimensions of the medium, any error in the
than 1) of its time relative to the start of the window quantizer output could cause the calibration algorithm
and then takes the highest peaks. This compensates for to fail to converge or cause it to produce wildly incorrect
the fact that peaks naturally decrease in amplitude as estimates of the medium’s size.
time goes on, due to attenuation of the acoustic signal. While several workarounds were proposed for these
problems, a lack of time and understanding led us to
3.4 Geometric Calibration abandon the four-point algorithm and pursue localiza-
tion techniques which required fewer assumptions. It
While we spent significant time testing and tuning the may be possible, however, to improve the four-point
quantizer, we were never able to move to validation algorithm by requiring the user to provide additional
of the four-point calibration algorithm, which uses the data during calibration. We could, for example, prompt
timestamp series produced by quantization to reason the user to measure the physical dimensions of the
about the physical extent of the medium and the loca- medium so that our system does not have to derive
tions of impacts. Nevertheless, we did draft an initial this information from sound samples using such an
solution to this problem. This proposal collects a set unstable algorithm. Our calibration algorithm could also
of calibration data, consisting of coordinate points on be improved to consider multiple taps simultaneously
the touch area paired with the quantized time series during its reasoning about the size and characteristics
produced when each point was tapped. This informa- of the medium; this would make it less sensitive to
tion, combined with the known position of the sen- variances in individual calibration taps.
sor, is used to derive the physical dimensions of the
acoustic medium. To do this, the medium is assumed
4 N EURAL N ET A LGORITHM
to be rectangular with edges parallel to the edges of
the designated touch sensitive area (which may or may 4.1 Choosing a Neural Network
not occupy the entire physical medium). The calibration As we pursued the four point quantizer approach, we
process begins by proposing that the medium does not also researched the use of neural networks. Our desire
extend past the edges of the sensitive area. The available was to calculate continuous output for localization. We
calibration data set is then searched for a timestamp discovered that neural networks may be a solution for
series that would not have been possible given its impact this problem as they are commonly used to approximate
location, the sensor location, and the currently proposed complicated functions.
4 SE-402 RESEARCH REPORT

Our research efforts led us to further investigate neu- to test neural network features by adjusting parameters
ral networks which would use a supervised learning in the network script and running a validation test to
mechanism. This meant that we would require users to see how well the configuration performed.
collect data which, in turn, would be used to train the
neural network. The two neural network types which 4.4 Processing Theories
we thought fit for our problem were the feed-forward
By the time of testing in MATLAB we had gained a more
back-propagation network and the radial basis function
formal introduction to neural networks in an artificial
network. Our goal in using the neural network was to be
intelligence course. With this knowledge, we began to
able to train the network and then use it as a black box
discover some of the missteps taken during setup of the
for solving localization between calibration points. Our
neural networks. This ultimately led to better and more
expected output was a coordinate pair corresponding to
thorough analysis.
a location on the user’s screen.
Moving to MATLAB, we were encouraged to expe-
rience quick training times relative to those we were
4.2 Implementation Approaches expecting from the trials in C#. Typical training times
Our implementation efforts started by looking for were on the order of 60 seconds. There were some
a C# API that implemented the feed-forward back- network configurations which took vastly longer and
propagation network. We looked for this specific net- attained the timeout threshold of 10 minutes, however,
work type as it has many common applications. Many this was observed in few cases. A recurring problem we
open source neural network APIs were found, but had was overfitting the network to the training data.
Aforge.NET [5] and FANN [6] were found to have very When the training goals were reduced, no improvement
good documentation and appeared to be suitable for our was observed.
needs. At this point, we also changed our goal with the
Getting the neural network to work proved to be a neural networks. Instead of training the network to
difficult task. At first it was unable to train and reach our produce a coordinate pair when given a waveform, we
performance goal, consisting of a maximum allowed er- pursued a network that could generate a waveform from
ror in the neural network output. Initially, we concluded a coordinate pair, thereby simulating the acoustics of
our performance goal error was too small and thus the medium. We understood that this was backwards
unattainable. We experimented by raising the tolerance from normal neural network operation but we decided
of the performance goal but found that the root of the to investigate further. The idea was that a neural network
issue was that the neural network was unable to make would be able to generate additional calibration data to
training progress in the time we were allowing it to train. assist the time-reversal algorithm (see section 5) reducing
The amount of time we allowed training to run varied the calibration effort required by the user. The generated
from several seconds to several minutes. To allow any calibration data would fill in the spaces and increase the
greater time for training, we feel, would have negatively resolution of the system.
impacted the user experience. Additionally, given that
each time the system is set up it may be positioned 4.5 Results
differently (based on sensor location, screen resolution,
Overall, we experienced many mixed results with neu-
calibration density, etc.) we consider calibration data
ral networks without gaining consistent improvement.
non-portable, therefore, calibrating for several minutes
Our best results running the neural network in the
during each setup is unrealistic and could cause users
forward direction were with the radial basis function
to not use this system.
network. Training was very quick and the calibration
Our lack of results with the neural network libraries
points stayed consistent in testing. Points distant from
led us to reevaluate the feasibility of using a neural net-
the calibrated points generally performed poorly.
work for tap localization. Through additional research,
The radial basis function network in the reverse direc-
we determined that this was an appropriate use of a
tion yielded interesting results. We found that smoothed
neural network so we continued to investigate their
wave generation was possible. However, from the in-
capabilities. We discovered MATLAB’s neural network
consistent results, we formed several suspicions. These
toolbox and employed it extensively to test and proto-
suspicions were rooted in inconsistent data and evidence
type neural networks.
that our tapping device was the cause.
Later we found that inconsistent tap device and data
4.3 Testing and Prototyping Neural Networks may not have been the sole cause for poor performance.
The testing and prototyping of Neural Networks became We found in an experiment that calibration data with
rigorous with the transition to MATLAB. Aided by cus- tighter spacing yielded better results. The calibration
tom functions and scripts, we were able to record data point spacing used for testing the neural networks was
in wave files and process them directly in the MATLAB 2-3x greater than what we found to be optimal (see
environment, which made the testing process more effi- section 5.2). When we discovered this new information,
cient and automated. We developed a standard process however, we had already abandoned the neural network
MAY 27, 2011 5

solution to shift focus on progressing the project. Given Our final matching technique, which provides the
time constraints we were unable to revisit the neural best results, is based on cross-correlation but adds the
network solution. statistical normalization used by Pearson’s technique to
compensate for varying loudness in the input signals:
5 T IME R EVERSAL A LGORITHM 1 n
∞ 
X Ai − Ā

Bi+t − B̄
!
r3 = max
In contrast to our four-point and neural net approaches, n − 1 t=−n i=−∞ sA sB
the time reversal algorithm has already been the subject
of extensive research [7] and has even been proposed The value produced by this formula will always lie in the
as part of a consumer product [8]. It makes only one interval [0 1], with 1 indicating a perfect match between
assumption about the acoustic properties of the medium, the two vectors. Note that, because of the statistical nor-
namely, that impacts at different locations will produce malization performed on the two vectors, this technique
distinct acoustic patterns. During calibration, a set of ignores their relative magnitudes.
known points are tapped and the resulting sound waves Matching functions were evaluated by running them
are stored. Then, in order to localize a new tap, its on two waveforms produced by tapping the same point
sound wave is compared against the stored calibration on a chalkboard and two waveforms produced by tap-
waveforms using some matching function, and the cali- ping different points. The function which best differen-
bration point whose waveform best matches the new tap tiated between taps at the same location and taps at
is returned as the result of the localization algorithm. different locations was considered the best.
The name “time reversal” comes from the underlying Our implementation of the time-reversal algorithm
theory that the waveform received by the sensor, if time- uses the same threshold-crossing algorithm as described
reversed and re-emitted into the medium at that point, in section 3.2 for detecting impacts. As noted in section
would reproduce the original impact waveform at the 3.2, a window size of 600 samples (at a 44.1 kHz sample
original impact location. rate) is used to represent each impact. Larger window
sizes (up to 1200 samples) were tried, but localization
results did not improve significantly. Besides, increasing
5.1 Choice of Matching Function window sizes degrades runtime performance by requir-
The first task in developing a time reversal solution is ing correlation to be computed over more samples. Var-
choosing the function to use for evaluating the match be- ious preprocessing functions were applied before com-
tween two waveforms. Prompted by previous research, puting the cross-correlation match for two waveforms,
we evaluated several candidate functions, including the including the Fourier transform and a 640-Hz low-pass
Pearson correlation coefficient [9] and a technique based filter (to smooth out oscillations in the signal). None
on cross-correlation [7]. The first technique uses the of these preprocessing functions improved correlation
traditional formula for Pearson’s r-value to measure the results, so they were eliminated.
match between two vectors A and B of samples taken
from the audio input hardware:
n    5.2 Calibration
1 X Ai − Ā Bi − B̄
r1 = The time-reversal algorithm requires more calibration
n − 1 i=1 sA sB
effort than would be necessary with more intelligent
where Ai is the ith sample from vector A, Ā is the mean approaches, such as four-point. This is due to the fact
of A, sA is the standard deviation of A, and similar that time reversal does not make any attempt to solve
definitions hold for B. or model the acoustic properties of the medium; it must
The second matching technique uses the maximum thus have calibration samples from points covering the
value achieved by the cross-correlation of the two input entire sensitive area.
vectors: All points in the designated sensitive area must be

!
n X within a certain distance of a calibration point so that
r2 = max Ai Bi+t the time-reversal algorithm can use impact data gath-
t=−n
i=−∞
ered from calibration points to approximate the acoustic
Where Ai and Bi are defined to be 0 for all indices profile of all other possible impact locations. Thus, cal-
i ∈/ [1 n]. This technique effectively searches along ibration points are usually arranged in a grid covering
the time axis for a shift value (t) that brings the two the area of interest. The most important aspect of the
signals into the closest agreement. This is an advantage calibration grid is the physical spacing of these points
over the Pearson technique used for r1 , which has no on the sensitive area. Interestingly, it is possible to cal-
tolerance for time-shifted signals. However, unlike the culate an upper bound for this spacing from the acoustic
Pearson technique, this cross-correlation is influenced by properties of the medium, such as the wavelength of the
the loudness of the signal as well as its shape. Thus, waves produced by impacts; see [7].
louder signals will tend to produce higher r2 values, We experimented with several different tap surfaces,
even if they do not match well together. including blackboards, solid wood tables, and composite
6 SE-402 RESEARCH REPORT

wood tables. We generally found an acceptable cali- Fig. 2. Discrete matching results.
bration point spacing to be between 3 and 6 inches;
this ensures that our discrete localization algorithm (see
section 5.4) always matches a tap on the sensitive area
to one of the nearest calibration points. Wider spacing
of calibration points tends to degrade both localization
schemes (see sections 5.4 and 5.5).

5.2.1 Point Refinement


To improve the integrity of calibration data, a technique
called point refinement was adopted. During calibration,
each calibration point is tapped three times, yielding
three sample signals. These samples are then matched
to one another using our chosen matching function.
The sample with the highest correlations to the other
two samples is retained in the final calibration data
set, while the other two samples are discarded. This
makes calibration more robust by enabling the system to
identify and discard bad data, which could be caused by
extraneous noise or by some other impact on the sensing
surface not intended as a calibration tap. It also improves
runtime performance of localization by reducing the
number of correlations that must be computed against
the calibration data set.
signals can be matched against the grid of calibration
samples paired with coordinate locations on the sensitive
5.3 Tap Exclusion
surface. The calibration point which best matches the
Once we had chosen an appropriate matching function, received signal is considered to be the location of the
we applied it to two separate problems: tap exclusion impact. In our experiments with grids of properly spaced
and tap localization. The first of these problems, tap calibration points, taps not coinciding with a calibration
exclusion, has received less attention from the research point almost always matched to one of the four nearest
community to date. The goal here is to ignore ambient calibration points, with occasional errors.
noise and impacts on areas not designated as touch- To simplify program operation when high resolution
sensitive. is not required, an alternative discrete solution was
This was identified as an important feature if this implemented, which matches taps to regions on the
technology were to be used in normal settings. Errant screen rather than to exact points. Each region must be
or unintended computer interaction could be prevented calibrated with a grid of points that properly cover it, as
with the implementation of such filtering. We developed described above in section 5.2. Incoming taps are then
two levels of tap exclusion. The first level is to filter matched to the center of the localization results produced
out input by monitoring the amplitude of sound directly by this technique. The grids of calibration points are
from the input source; see section 3.2. The second level not shown; the regions were calibrated on a classroom
is correlation exclusion. chalkboard with a 3-inch spacing between points. The
The second level of exclusion filters out errant taps calibration points did not coincide with the test points.
based on the tap’s correlation to the calibration data. The grid overlaid on the regions in the figure shows the
A linear function was found to express the needed test points that were tapped to verify proper operation of
threshold for the incoming tap’s correlation to the stored region-based discrete matching. As you can see, points
library of calibration data. This function depends on the falling within the regions are matched to the proper
spacing of the calibration points. The trend found was region with almost perfect accuracy.
that the wider the spacing between calibration points the
The grid vertices not colored indicate test points
lower the threshold must be. Therefore a larger spacing
whose taps were excluded. The tap excluder performed
will permit more taps than a tighter, or smaller, spacing.
poorly in the lower-right corner of the area shown,
permitting several taps not in a designated sensitive
5.4 Discrete Localization region. This may be due to the fact that the sensor
The goal of tap localization is to determine where an was attached to the medium in this area. Another factor
impact originated, based on the received signal. Two be an improperly selected threshold for tap exclusion.
approaches to this problem were tried: one discrete and Since creating this figure, we have implemented a tap
the other interpolated. The discrete solution is much exclusion scheme that dynamically selects this threshold
simpler; once a correlation function is selected, input based on the calibration point spacing, addressing part
MAY 27, 2011 7

Fig. 3. Example correlation surface, using a 3-inch mesh to yield the coordinates of the output point. Performing
spacing. this weighted average over the entire set of calibration
points is not a good solution, however, because it will
always bias the output points toward the center of the
mesh; poorly correlated calibration points still have a
non-zero correlation and will thus be factored into the
average. Therefore, some method must be introduced
for choosing a set of representative points from the
grid so that only those points are averaged together.
To further reduce the effect of poorly correlated points,
another step is added: once the representative set of
points is chosen, the lowest correlation value in the set is
subtracted from the correlation value for each point, with
values dropping below zero being clamped to zero. We
tested this weighted-averaging technique using several
algorithms for selecting the representative set of points.

5.5.1 Selection by Cells


In this technique, we find the highest discrete point on
the surface (that is, the best-correlated calibration point).
From the four grid cells which have this point as a
corner, we choose the cell with the highest average cor-
relations at all of its corners. The four corners of this cell
of the tap exclusion problem illustrated here. form the representative set which is used for weighted
averaging. Selecting only a single cell, however, biases
5.5 Interpolated Localization interpolation away from calibration points and towards
the centers of cells, since the center of the representative
With the time reversal algorithm, it may not be necessary
set is always the center of a cell, and this is where the
to constrain localization to the discrete set of known
average of the corners’ coordinates tends to fall. In order
calibration points. Investigating this possibility is the
to avoid this, the eight points surrounding the best-
goal of interpolated localization. The four-point and
correlated point (i.e. the corners of all four adjacent cells)
neural net algorithms promise interpolated localization
were taken as the representative set. This technique,
by solving the acoustic transfer function of the medium.
however, biases interpolation towards calibration points,
Time reversal does not attempt to solve the transfer
since the set of representative points is always centered
function, and therefore it must perform interpolated
on a calibration point. In summary, neither of the cell-
localization based solely on the correlation of the input
based techniques provide consistently good interpolated
tap with its set of calibration taps. If the calibration grid
localization.
points are spaced closely enough, the correlation values
for grid points provide sufficient sampling of a smooth
correlation surface with its peak at the actual location 5.5.2 Individual Selection
of the impact [7]. Figure 2 shows an example of such Alternative methods of selecting the representative set
a surface from one of our own experiments. The next of points include choosing all points with correlations
step is to find the maximum point of the (presumably) above a certain threshold and choosing the best n cor-
smooth surface of which each calibration point provides related points for some constant n. These methods have
a discrete sample. If the spacing of calibration points is the advantage of not depending on a single, maximal
too wide, peaks in the correlation surface will not be discrete point for their geometric arrangement (as the
properly sampled by the calibration grid, which could cell-based approaches do). This allows the representative
cause localization results to be grossly inaccurate. set to vary more freely as the characteristics of the cor-
One possible technique for interpolated localization relation surface change. Also, under the threshold-based
would be to fit piecewise-smooth regression curves to approach, the shift by which all correlations are reduced
the correlation values in each row and column of the before averaging can be a constant (that is, the threshold
calibrated grid. The maxima of these horizontal and itself) instead of being determined by a single point sam-
vertical curves could then be calculated and combined to pled from the representative set. This makes localization
produce a location within the grid. We did not examine less dependent on any single datum, improving stability.
this technique, however, due to a lack of time. In our experiments, these approaches (one based on
The technique we did investigate performs weighted a correlation threshold and the other on a constant n
averaging of calibration point coordinates in the x- and number of points) provided better interpolation results
y-dimensions (using correlation values as the weights) than the cell-based approaches.
8 SE-402 RESEARCH REPORT

Fig. 4. Results of average-threshold interpolation, with a 5.6 Performance Concerns


power of 25. Performing correlation of an input tap with all samples
in the calibration set is a time-intensive task. Our goal
was to keep the time taken by localization small enough
so that users perceive their input as registering instan-
taneously. To achieve this goal, we performed several
optimizations.
The cross-correlation of signals (used by us to calculate
the correlation between two taps) is traditionally com-
puted across its entire domain using Fourier transforms.
We, however, do not usually need to compute the cross-
correlation across its entire domain, since we are only
interested in its maximum value, which usually occurs
with a shift value near zero. Our edge detection algo-
rithms ensure that sampled signals are at least somewhat
aligned along the time axis, so the cross-correlation only
needs to be computed over a small interval centered at
zero in order to compensate for slight timing variations
in the sampled signals. Thus, instead of taking the
Fourier transform of both signals and then multiplying
them together to find the cross-correlation, we calculate
the cross-correlation over a window of ±120 samples
(±2.7 ms at our 44.1 kHz sample rate) using a naive
There are problems with these techniques, however.
algorithm (that is, a simple sum of products for each
The threshold approach has the disadvantage that it
possible offset):
requires manual selection of the threshold value, which
may need to be re-tuned if the mesh spacing, tap device, 1 n
120 
X Ai − Ā

Bi+t − B̄
!
0
or tap surface change. The “constant-n” approach also r3 = max
n − 1 t=−n i=−120 sA sB
has a disadvantage: it relies on a single point (the nth-
best correlated calibration point) to provide the shift by Finally, since the time reversal algorithm requires sev-
which all the other correlation values are reduced before eral independent correlations to be calculated, it adapts
being averaged. Reliance on a single point for anything well to parallel computing. Running on a dual-core
introduces unwanted variability into the system. machine, we were able to halve the running time by
To address these concerns, we tried one last inter- using parallel tasks to calculate correlation values for
polation algorithm. Instead of manually specifying the our whole set of calibration data. As an example, we
threshold for selecting the representative set of calibra- ran our parallelized discrete localization algorithm on
tion points, we calculate it each time by taking the mean a calibration set containing 60 taps; localization took an
of the correlation values of all the calibration points. average of 101 ms each time. This timing is good enough
To bias this mean upwards, however, we raise all the to provide a seamless user experience.
correlations to a power p > 1, take the mean, and then
raise the mean to the power p−1 . We experimented with 6 S ENSING H ARDWARE
various values for p on a 7-by-7 calibration point mesh,
finding the best interpolation results around p = 25. With 6.1 Sensor
this value for p, each representative set contained an The sensor we chose was the Knowles accelerometer
average of 3.1 points. Figure 4 shows a vector plot of which was used by TAICHI researchers [4]. The Knowles
the results of this interpolation algorithm with p = 25. device, part number BU-21771, is a high sensitivity ce-
Each green circle is the location of an actual impact, ramic vibration transducer. This device was an attractive
either at a calibration point or at the middle of a grid choice because it requires low voltage (we were able to
cell, and the corresponding blue arrow points to the power it by stepping down the voltage from a USB port)
location to which the interpolation algorithm mapped and the output voltage did not require amplification to
that impact. Ideally, then, the blue vectors should all be read by the sound card.
have length zero. From the figure it can be seen that Early in the project we pursued our goals using a
this interpolation technique works well in some areas standard desktop microphone as our sensor. Dr. Sverre
but has a consistent bias in other areas. This bias, which Holm preceded our usage of this sensor type in his
during experimentation seemed to persist with repeated demonstration [3]. We discovered several weaknesses in
taps, may be due to inaccuracies in our calibration data the use of the microphone. The first weakness is that
set or to acoustic features of the underlying medium. In the microphone was highly susceptible to interference of
any case, this was the best interpolation we achieved. ambient noise, whereas the same ambient noises did not
MAY 27, 2011 9

affect the Knowles vibration transducer. Additionally, we to fully test finger tapping; however, in several trials
observed that the sound of the impact carried through we found that using a finger as the tap generator was
the air has a significant influence on the desktop mi- possible. We have identified this as a future opportunity
crophone. This is a particular problem because we want area of research.
to monitor only the vibrations which propagate through
the medium; the Knowles device performs far better in 8 S URFACE
this respect.
The first medium, or surface, we tested upon was a slate
Attempts were made to modify the microphone to
chalkboard. This medium was chosen as the standard
improve the performance, such as removing the casing
testing surface for consistency and also because of the
and soundproofing it from with added padding. How-
proven demonstration by Holm [3]. While testing on the
ever, these techniques did little to improve the behavior.
slate surface we tested a number of tapping techniques
Ultimately, we identified the Knowles vibration trans-
as described in section 7. Overall, this medium when
ducer as the sensor appropriate for our project. The
used with a consistent tap produced good results.
transducer’s cost was slightly higher than the desktop
The second medium we tested was a thin plastic
microphone, but this difference was not a deterrent for
sheet which would be used as an inexpensive pane for
the performance gains.
framing. With this medium the importance of the tap’s
intensity was revealed to have a significant impact on
6.2 Signal Capture the localization performance. However, we experienced
All results described in this paper were achieved using good localization and tap exclusion results. In order to
a normal laptop sound card to digitize the sensor signal; accommodate tapping on this surface we had to lower
the hardware sample rate was 44.1 kHz. Extant literature our impact detection threshold to accommodate taps
supports the conclusion that this arrangement is suffi- with less intense peaks. This was also our first test to
cient for good localization [7], [10]. not include tapping with a specific device. We observed
consistent matching by knocking our fingernail against
7 TAP D EVICE H ARDWARE the medium when both of the calibrated location and
intensity were matched.
In our early experiments and tests, one theory we had
The last surfaces we tested our system on were table-
about the observed poor results was that our tap device
tops. The material of the table varied from solid wood to
was producing an inconsistent tap when struck against
wood composite. We tested tapping this surface with our
the surface. Originally, we used chalk against a slate
fingertips and our tapping device. We were able to lower
blackboard as demonstrated by Holm [3]. Our analysis
the impact detection threshold low enough to accommo-
and observations led us to believe that chalk, when hit
date both tapping techniques, and the localization results
against the chalkboard, can produce a different sound
observed were acceptable.
each time causing our localization techniques to produce
We observed that the properties of the tap were
incorrect results.
slightly altered in each medium tested. These differences
The goal then became to find a device that would pro-
required minor changes to the exclusion parameters
duce a consistent tap. The proposed device was a spring
of the processing algorithms. However, the parameters
loaded punch with a repeatable mechanical action. Upon
were finally set to a point to accommodate all of the
running the same tests as with the chalk, we were
aforementioned surfaces. Additionally, in our GUI ap-
surprised to find similar results. The consistently tapping
plication we give the user the ability to modify these
center punch did not improve our results. What these
parameters to customize the algorithm performance to
tests revealed to us was that the tap generating device
any medium.
was not the sole cause for poor algorithm performance.
After that, our investigations into the tap device were
shelved while we refined our matching algorithms. Dur- 9 C ONCLUSIONS
ing our tests with interpolation, we rediscovered the In this project, we were not able to achieve our original
inconsistencies of tapping with chalk to be an issue. goal of a robust interpolation algorithm. We did, how-
This time, we tried tapping with a rubber coated bolt, ever, make progress toward this goal and built a strong
which yielded much better results. We surmise that the understanding of the problem domain, which would
rubber coating softens the impact, producing a smoother allow us to make real progress on interpolation if we had
and more regular input to the surface’s acoustic trans- more time. We were able to implement a polished dis-
fer function. The results we observed as a result of crete time-reversal solution designed for operation with
changing to this tapping device were encouraging. In a projector display overlaid onto the touch surface. This
an experiment we observed behavior that showed the software projects calibration point locations on the screen
tap matching performance with the rubberized bolt was and walks the user through the process of calibrating
much improved over the chalk tests. each one.
Our last tap device test was to tap the medium with Our investigations into tap exclusion (see section 5.3)
our fingers. Due to time constraints we were unable take, we believe, a new direction which has not received
10 SE-402 RESEARCH REPORT

significant attention from past researchers. The results deliver some entertainment value while still providing
of our tap exclusion algorithm are fairly good, and the system with paired locations and acoustic samples.
exclusion has been incorporated into our final software
deliverables. 10.4 Other Interpolation Techniques
None of our interpolation techniques achieved satisfac-
10 F URTHER R ESEARCH O PPORTUNITIES tory localization. This may be due to our calibration
10.1 Alternative Surfaces grids still being too widely spaced, but it may also be
The great potential of this technology lies in its applica- that interpolation techniques based on weighted averag-
tion to a wide variety of everyday surfaces. It would ing are not a good solution. It may be better to inves-
be interesting to test the sensing hardware and time- tigate a regression-based solution which fits piecewise
reversal software we have developed against surfaces smooth curves to the rows and columns of samples on
like walls and tile floors. Some of our software param- the correlation surface and then solves those smooth
eters would need to be tuned to the new environment, curves for their maxima.
but we surmise that our algorithms would still provide
meaningful results in these situations. 11 R EFERENCES
For a video demonstration of this project’s results, see
10.2 Tap Devices [11].
To reduce the hardware associated with this system, we
began testing our solutions where the method of input R EFERENCES
is the user’s fingertip rather than a dedicated tapping
[1] D. T. Pham, Z. Wang, Z. Ji, M. Yang, M. Al-Kutubi, and S. Cathe-
device. This is a challenge that we were unable to devote line, “Acoustic pattern registration for a new type of human-
much time to; however, it is one improvement that we computer interface,” in IPROMS 2005 Virtual Conference, May
feel would make this technology ready for ubiquitous 2005.
[2] W. Rolshofen, D. T. Pham, M. Yang, Z. Wang, Z. Ji, and M. Al-
use. Challenges we observed using fingertips as the tap Kutubi, “New approaches in Computer-Human Interaction with
generator included variations is the intensity of the tap. tangible acoustic interfaces,” in IPROMS 2005 Virtual Conference,
If the intensity was not consistent a tap would not occur. May 2005.
[3] S. Holm, “Touch sensitive blackboard,” Institutt for Informatikk,
This problem was extended for situations when multiple Universitetet i Oslo, 2008. [Online]. Available: http://www.
users tried to use the application. We found that it was youtube.com/watch?v=V4NwoiPGkVY
difficult for a user who did not perform the system [4] “Technical Solutions for the TDOA Method,” Dipartimento di
Elettronica e Informazione, Politecnico di Milano, Tech. Rep.,
calibration to find the right intensity and easily use the 2006.
application. [5] “Aforge.net,” Andrew Kirillov, et. al. [Online]. Available:
http://www.aforgenet.com/framework/
[6] “Fast artificial neural network library,” Steffen Nissen, et. al.
10.3 Transparent Calibration [Online]. Available: http://leenissen.dk/fann/wp/
[7] R. K. Ing, N. Quieffin, S. Catheline, and M. Fink, “In sold local-
The biggest obstacle to the use of time-reversal acous- ization of finger impacts using acoustic time-reversal process,”
tic localization is the large amount of calibration data Applied Physics Letters, vol. 87, 2005.
[8] “Reversys technology,” Sensitive Object. [Online]. Available:
required from the user each time the system is set up http://sensitive-object.com/-ReverSys-R-
with a different surface or sensor position. One possible [9] D. T. Pham, M. Al-Kutubi, Z. Ji, M. Yang, Z. Wang, and S. Cathe-
solution is to collect calibration data transparently dur- line, “Tangible Acoustic Interface Approaches,” in IPROMS 2005
Virtual Conference, May 2005.
ing normal usage of the system, so that the user does [10] “Technical solutions and demonstration for acoustic pattern
not realize that calibration is taking place. Meaningful recognition using time reversal method,” Laboratoire Ondes et
calibration data must consist of a known location paired Acoustique, Tech. Rep., 2006.
[11] S. Pollen and N. Radtke, “Acoustic touch screen demonstration,”
with a received acoustic signal, presumably caused by Milwaukee School of Engineering, 2011. [Online]. Available:
an impact at that location. To collect such paired data http://www.youtube.com/watch?v=ZoAslMiukAQ
during normal usage of the system, the system must
somehow guess where the user actually tapped to pro-
duce the signal it has received. If the user is interacting
with a GUI composed of discrete sensitive regions (like
buttons), one way to guess at this location is to assume
that each time the user taps a button, he or she tried to
tap the center of that button. Thus, though the system
will require enough manual calibration to be able to
correctly identify which button the user tapped, after
that point, it can gradually refine itself further by using
data transparently gathered from user operation.
Other techniques for transparent calibration might
include the development of calibration games which