Measurement Strategy For Tracking A Tennis Ball Using Microsoft Kinect Like Technology

Measurement Strategy for Tracking a Tennis Ball
using Microsoft Kinect like Technology

Acharya Pratik Bhasmang
IIT Bombay, 2017
ABSTRACT
There are many methods for detecting and tracking the position of a ball in various
sports. In this project report I present a new method for tracking tennis balls based on Microsoft
Kinect like sensors that use IR and RGB cameras to perceive the depth of objects in front of the
setup. This can also be used in other net sports like badminton and volleyball.
INTRODUCTION
There are obvious methods for obtaining 3-dimensional ball speed. One of these
is the use of a radar gun, often employed courtside at tennis tournaments to give speed values of
Serves. However radar guns lose accuracy if the moving object is not moving in a direct line (on
a collision course) with the radar gun. This is called the cosine effect. Essentially, radar guns
just give the closing speed between the moving object and the radar gun. This restricts the radar
guns use to shots that move on a direct line towards the radar gun. Another method for
obtaining ball speed is three-dimensional reconstruction of footage from 2 cameras. Such a
method was used by Choppin et al. using 2 calibrated and synchronized high speed cameras.
This method used a calibration procedure that requires a number (>20) synchronized images of
a checkerboard from each camera once set up is complete. The calibration procedure and use of
two cameras mean that this method is not suitable here. Interest in the measurement of the
speed of balls used in sporting games is becoming increasingly common. Examples are the
determination of the speed of a bowlers delivery in cricket, the speed of a pitch in baseball and
the speed of tennis balls, in particular the speed of a serve in the latter sport. Such speeds are
usually measured using radar speed guns. These devices measure the speed of approach or
recession using the well-known (longitudinal) Doppler effect, named after the Austrian
physicist Johann Christian Doppler, in which the frequency difference between the reflected
signal and the transmitted signal (the beat frequency) is directly related to the relative speed of
the ball and the radar (see e.g. Halliday et al 2011). An increase in frequency of the reflected
signal means the ball is approaching, a decrease in frequency means it is receding. It is
important to note that such devices normally measure only the radial (or line-of-sight) velocity
and thus, except in special cases, will always under-estimate the true velocity. As a simple
example, in tennis one occasionally hears commentators allude to this problem by conjecturing
about the speed of a wide serve (apparently) being under-estimated compared to a serve down
the centre of the court. Presumably this is because the speed gun is normally placed in line with
the centre-line of the court. There is also the question of whether the indicated speed is the
initial speed or an average speed, but this is even less often discussed. We need an accurate
tracking method.
BALL DETECTION ALGORITHMS (Working of Kinect sensors):
Voxelization is the process of adding depth to an image using a set of cross-sectional

images known as a volumetric dataset. These cross-sectional images (or slices) are made up
of pixels. The space between any two pixels in one slice is referred to as interpixel distance,
which represents a real-world distance. And, the distance between any two slices is referred
to as interslice distance, which represents a real-world depth.
The dataset is processed when slices are stacked in computer memory based on interpixel
and interslice distances to accurately reflect the real-world sampled volume.
Next, additional slices are created and inserted between the dataset's actual slices so that
the entire volume is represented as one solid block of data.
Now that the dataset exists as a solid block of data, the pixels in each slice have taken on
volume and are now voxels.
For a true 3D image, voxels must undergo opacity transformation. Opacity transformation
gives voxels different opacity values. This is important when it is crucial to expose interior
details of an image that would otherwise be hidden by darker more opaque outside-layer
voxels.
Voxel images are primarily used in the field of medicine and are applied to X-Rays, CAT
(Computed Axial Tomography) Scans, and MRIs (Magnetic Resonance Imaging) so
professionals can obtain accurate 3D models of the human body.
The general algorithm to detect ground and flying ball is the following:
Compute Ground Plane from first image
Compute transform to align ground with XZ plane
for Each point cloud do
Align point cloud with XZ plane
Voxelization of grid
Detection of flying ball y > 1 (use flying object mask)
if no flying ball then
Detect ground ball y=1 (use ground mask mask that ignores bottom part
of flying mask)
end if
end for
The general algorithm for trajectory estimation is as follow.
Update history
if ball detected then
if new position support previous trajectory then
Compute trajectory with all points
else
Compute new trajectory with last 3 points
end if
if If trajectory error below threshold then
Update new trajectory
end if
end if
if NOT(last 2 positions exist and support trajectory) then
reset trajectory
end if
DESIGN OF THE INSTRUMENT
Components of Kinect Sensor The Kinect sensor bar contains depth sensor, color
camera, a special infrared light source, and four microphones . The major components of
the Kinect sensor are shown in Figure. A tilt motor working as the base enables the
device to be tilted in upward and downward direction. The list of Kinect sensor
components are given below:
Color Camera: The color camera has the ability to capture and stream the color
video data. The Kinect camera can capture color stream at frame rate of 30 frames per
second (FPS) and can detect the red, blue, and green colors. The video stream consists of
various image frames and has a resolution of 640 x 480 pixels. The field of view (FOV)
for the color camera ranges from 43 degrees vertical by 57 degrees horizontal.
Infrared (IR) Emitter and IR Depth Sensor: Kinect has the ability to provide the
3D information of a scene or an object. The depth map of the environment in front of the
camera can be obtained directly with the Kinect device. The processing of the depth
signals is done entirely inside the sensor device and the generated depth map is later
transmitted similar to the color image. The only difference is that the pixel of each depth
image contains the distance information; the sensor transmits the distance values for
each depth pixel. Shows the depth sensing process to obtain the distance information of
any scene or an object. The Kinect device contains two depth sensors: IR emitter and IR
depth sensor. The IR emitter depth sensor is mounted as a camera on Kinect but in
actual it is an IR projector which emits the infrared light on the objects in a "random dot
pattern". The infrared light is projected on the objects in the dot pattern which is
captured by IR depth sensor. IR depth sensor capture depth information from the dotted
light reflected off different objects. This invisible dot information is used to calculate the
distance between the sensor and the object from where the IR dot was read and is
transformed into depth data. Kinect Depth Sensing Process to Obtain the Distance
Information with Infrared (IR) Emitter and IR Depth Sensor.
Depth Data Processing: The depth stream contains a number of depth frames
where the pixels in each frame contain the distance information in millimeters. The
three resolutions supported by the depth stream are 640 x 480 pixels, 320 x 240 pixels,
and 80 x 60 pixels, and the depth data contains the distance information to the nearest
object from the camera plane at a particular (x, y) coordinate. The depth sensor's field of
view range remains the same as the field of view of color camera. The Kinect sensor uses
an IR emitter and an IR depth sensor that is a monochrome CMOS (Complimentary
Metal-Oxide- Semiconductor) sensor to capture the 3D information of an object. The
steps of depth data processing are detailed in Figure . Steps of the Kinect Depth Data
Processing with IR Emitter and IR Depth Sensor The flow diagram steps are explained as
follows: (1) The PrimeSense chip sends a signal to the IR emitter to turn on the infrared
light to capture the depth data. (2) In addition, the chip also sends a signal to the IR
depth sensor to initialize the depth sensor. (3) The IR emitter starts emitting an
electromagnetic radiation to the objects in front of the camera. The sensor's IR lights are
invisible because the wavelengths of the radiations are longer than the wavelength of
the visible light (4) The IR depth sensor capture depth information and obtain the
distance between the sensor and the object from where the IR dot was read. (5) The
depth sensor returns the coded depth light to the PrimeSense chip. (6) The PrimeSense
chip process the depth stream and form a frame by frame depth stream to create the
output display data and form a depth image ready for the display.
Tilt Motor The tilt motor connects the base and body of the sensor with a small
motor which has a vertical field of view that ranges from -27 to +27. The Kinect sensor
can be shifted upwards or downwards by 27 degrees, thus increasing the range of view
to capture the color and depth data. The motor can be controlled to adjust the elevation
angle of the sensor in order to get the best view of the scene or an object.
Microphone Array and LED: The Kinect uses the four microphones in the sensor
bar which are arranged in a linear fashion to locate sound. It has the ability to detect the
audio sound and can displays the angle from the sensor to any sound source. The Kinect
bidirectional microphone has the advantage of capturing and recognizing the audio
beam effectively with enhanced noise suppression, echo cancellation, and beam-forming
technology. An LED in the Kinect device is used to indicate the status that the Kinect
device drivers have loaded properly. It shows green color when the Kinect is connected
to the computer and tells that device is ready for use to create applications. It is placed
between the projector and the camera.
We place 8 such sensors along the net 4 on each side as shown in the figure
below. Red dots indicate the sensor locations. We need to use stronger sensors since the
court is around 40 ft long on each side.
Parts of a Kinect Sensor:
Placement of sensors in the court (Indicated by red dots):

SOURCES OF ERROR
Error and imperfection in the Kinect data may originate from three main sources: - the
sensor; - the measurement setup; - the properties of object surface. The sensor errors, for a
properly functioning device, mainly refer to inadequate calibration and inaccurate
measurement of disparities. Inadequate calibration and/or error in the estimation of the
calibration parameters lead to systematic error in the object coordinates of individual points.
Such systematic errors can be eliminated by a proper calibration as described in the previous
section. Inaccurate measurement of disparities within the correlation algorithm and round-off
errors during normalization result in errors, which are most likely of a random nature. Errors
caused by the measurement setup are mainly related to the lighting condition and the imaging
geometry. The lighting condition influences the correlation and measurement of disparities. In
strong light the laser speckles appear in low contrast in the infrared image, which can lead to
outliers or gap in the resulting point cloud. The imaging geometry includes the distance to the
object and the orientation of the object surface relative to the sensor. The operating range of the
sensor is between 0.5 m to 5.0 m according to the specifications, and, as we will see in the
following section, the random error of depth measurement increases with increasing distance to
the sensor. Also, depending on the imaging geometry, parts of the scene may be occluded or
shadowed.
CONCLUSION AND FURTHER IMPROVEMENTS
In this project, we found out the position and velocity of a tennis ball using an array of Kinect
like sensors. Acceleration can also be found out using the data. Stronger sensors can be used to
improve the resolution and range. A better version can be made for sports like cricket and
football.
REFERENCES
Kinect sensor based object depth estimation Kajal Sharma

En.wikipedia.org/wiki/Kinect
Statistical Analysis-Based Error Models for the Microsoft Kinect TM Depth Sensor
Benjamin Choo 1 ,*, Michael Landau 1, Michael DeVore 2 and Peter A. Beling 1
Kinect-Based Algorithms for Motion Analysis
Joao Cabral 1 and Pedro Lima2
How does the Kinect work? - John MacCormick
Radar speed gun true velocity measurements of sports-balls in flight: application to
tennis Garry Robinson1 and Ian Robinson2,3
GPS Satellite Position Estimation from Ephemeris Data by Minimum Mean Square Error Filtering
Under Conditions of Selective Availability -Ryan Monaghan, Student Member, IEEE

Measurement Strategy For Tracking A Tennis Ball Using Microsoft Kinect Like Technology

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Measurement Strategy For Tracking A Tennis Ball Using Microsoft Kinect Like Technology

Cargado por

Copyright:

Formatos disponibles

Measurement Strategy for Tracking a Tennis Ball

using Microsoft Kinect like Technology

IIT Bombay, 2017

BALL DETECTION ALGORITHMS (Working of Kinect sensors):

Voxelization is the process of adding depth to an image using a set of cross-sectional

DESIGN OF THE INSTRUMENT

Placement of sensors in the court (Indicated by red dots):

CONCLUSION AND FURTHER IMPROVEMENTS

Kinect sensor based object depth estimation Kajal Sharma

También podría gustarte