Está en la página 1de 24

MJS Issue 1

May 18, 2015

A Letter from the Editor


Dear reader,
Welcome to the first issue of U-Highs new Science Journal, Maroon Journal of Science! This journal will
publish biology, chemistry, physics, and computer science papers written by U-High Students. The Journal
will be published semi-annually. An editorial board will review submissions to the Journal. If you have a
paper youd like to submit, please send it to maroonsciencejournal@gmail.com.

Staff:
Edward Litwin
Nora Lin
Dhanya Asokumar
Raghu Somala
Joanna Cohen
Leah Umanskiy
Jay Dhanoa
Anna Knes
Christine Obert-Hong
Walker Melton
Sam van Loon
Justin Whitehouse

Editor-in-Chief
Biology Editor
Assistant Biology Editor
Assistant Biology Editor
Chemistry Editor
Chemistry Editor
Physics Editor
Assistant Physics Editor
Assistant Physics Editor
Computer Science Editor
Assistant Computer Science Editor
Assistant Comptutuer Science Editor

Vertical Jump
by Christine Obert-Hong
How is it that the average male
tiger, which usually weighs 221.2
kg, is capable of jumping 4.55 meters horizontally, while the average man, usually weighing about
79.73kg, is only able to jump about
12 meters? Why can the tiny flea
jump one hundred times its own
body length? The reason for all
of these occurrences is directly related to how each mammal utilizes
its body for physics. In this paper, I will discuss the physics involved in jumping from a horizontal surface by addressing the concepts of normal force and the force
of gravity, the anatomical structure of the jumping object, and the
energy that is transferred through
the jumping object from takeoff to
landing using examples in cats, humans, and fleas.
The normal force, FN , and the
force of gravity, Fg , are two opposing forces that act on an object of
mass m at the same time, provided
the object is on a substrate. A substrate is any solid or liquid that is
capable of producing an opposite
force on an object. Newtons third
law states that when one object exerts force on a second object, the
second object exerts an equal but
opposite force. FN is the opposing force that the substrate exerts
on the object. Fg is the attracting
force the Earth exerts on an object, regardless of whether or not
it is on a substrate. The relationship between Fg and FN on a flat
surface is
FN = Fg = mg

.
When an organism wants to
jump, it must first exert more force
against the substrate than Fg exerts on the object. One would
assume that, because FN = Fg ,
it would take very little force to
actually get an organism into the
air, but it actually depends on
how quickly the force is exerted.
For example, if an animal were to
slowly push its foot against the
ground, regardless of the amount
of pressure exerted, the animal
would not actually jump. While
there would be enough force to
push the animal off of the ground,
the force is distributed across a
longer period of time. An animals
change in momentum, called impulse, helps determine whether or
not it will jump.
A key example of the interaction of normal force and the force
of gravity is when a person attempts to jump vertically. To derive the height of the jump, h, the
objects center of gravity, which
is the distance d when the person lowers into a crouch, must be
calculated. Newtons third law
is demonstrated, because an equal
upward force pushes back on the
jumper. The two forces acting
on the jumper are the gravitational acceleration g (downward
force) and reaction force F (upward force). This gives the formula
(Davidovits, 2004):
mg
a=F
mg

through experimentation, yield a


reaction force that is 2 times his
or her weight in a good jump. So
F = 2mg. In this instance the
height, h, will be equal to distance, d, and is also proportional
to limb length. The average person
has a distance of approximately
60cm. In professor Paul Davidovits book, Physics in Biology
and Medicine, he writes, This
work is converted to kinetic energy as the jumper is accelerated
upward. At the full height of the
jump h, the velocity of the jumper
is zero... At this point the kinetic energy is fully converted to
potential energy as the center of
mass of the jumper is raised to a
height (d + h) (Davidovits, 2004).
Therefore, from conservation of energy, it is understood that the work
done on the body equals potential
energy at maximum height. This
leads to the equations (Davidovits,
2004):
F d = mg(d + h)
h=

(F mg)d
mg

The anatomical structure of


the jumping organism is also an
important factor of jumping potential. When we examine the vertical jump of the average cat, we
must consider what parts of the
cats anatomy are involved. The
cats vertical jump requires a flexible spine, powerful back and leg
muscles, and long, stretchy tendons. This combination allows the
An athletic person can, as seen release of large amounts of energy
2

in quick bursts. The rapid release


of energy enables the cat to jump
heights over 6 times the length of
its body. The preparation by the
cat for its jump has been described
as a coiled spring when it bends its
legs, knees, back, hips and ankles
before it launches into the air (Cat
Behavior Explained).
In a study on a cats ability to
jump, Michelle Harris and Karen
Steudel set out to show the contributions of the extensor muscle work to increase the kinetic
and potential energy of the center of mass during takeoff and
to show maximum TOV (takeoff volume) would be dependent
on limb length, extensor muscle
mass, body mass and percentage of
fast-twitch muscle fibers.1 They
found a maximum TOV was due
to hind limb length and a ratio of
fat mass to lean mass, but not the
extensor muscle. They believed
that limb length would affect endurance and speed in a positive
manner leading to an increased
jumping ability (Harris, 2002).
Even though there have been numerous studies on the physics involved when a cat jumps, not all
agree that there is a correlation
between limb anatomy and maximum speed (Harris, 2002). TOV
is defined as the mechanical work
ME generated by the hind muscles
from the crouch to the last contact.

tential, the following equation is knee, and flex the ankle. The storderived:
age and recycling of energy in the
elastic material of the muscle and
1
tendons help to propel the jumper
W m = mv 2 + mgh.
2
(Umberger, 2998). Very little work
Where m is body mass, v is TOV is done at the joints. The enand g is gravitational acceleration. ergy generated in the muscles is
The optimal angle for a vertical transferred through the ligaments
jump is 90 . The height, h, is and tendons. Roughly fifty perestimated by stretching the limb cent of the energy generated by
at the onset of the jump. As the large hip muscles will be transthe length of the limb is increased ferred down to the knee and analong with the COM (center of kle (Umberger, 2998). This transmass), a greater takeoff velocity re- fer of energy from the large muscles to the much smaller muscles of
sults. This gives the equation:
the knee and ankle provide the exV 2 = 2al
tra energy that the knee and ankle
require to jump efficiently by cre, where a is the acceleration and ating a spring-like effect. Hookes
l is the length of the limb. These Law of elasticity states that force
equations combine to give us and necessary to extend or compress a
show the effect of the hind limb spring by a distance x is directly
length on TOV (Harris, 2002):
negatively proportional to the distance (Georgia State University).
Muscle work = ml(a + g)
F = kx
In a human, we find that the
anatomical structure of the hu- F is the restoring force and k is the
man leg and the transfer of energy spring constant.
through the leg contribute to the
It is important to note that,
humans jumping ability. After ex- while the muscles help build up enamining the human leg, one may ergy, the energy the tendons pronotice there are many muscle in- duce make up most of the overvolved in the explosive movements. all force. This can be reversed,
The largest muscle mass in the leg so that the greater the x distance,
is located near the hip with much the more force is released. Thus,
less muscle mass near the knee and the longer the tendon, the more
ankle. The muscle mass serves a force can be applied to a substrate.
mechanical purpose.
When cats crouch, their back and
The preparation of jumping in- leg tendons tighten, and their body
Muscle Work = W m,
volves the flexion of the hip and is essentially turned into a slinky.
where W is the work done per unit knee and dorsiflexion of the anThe flea is one of the aniextensor muscle mass and m is the kle. During takeoff, the muscle, mal kingdoms champion jumpers
muscle mass. Because the total joints capsules, tendons and lig- and is because it is wingless, yet
work involves both kinetic and po- aments will extend the hip and jumps hundreds of times its own
1 The relationship between maximum jumping performance and hind limb morphology/physiology in domestic cats (Felis
silvestris catus)

body length. While the fleas remarkable jumping ability is due


to the storing and releasing like
a spring, it is not known how
that energy is transferred from the
fleas legs to the ground. A recent discovery showed that fleas
push off the ground using their
knees (trochantera) or their feet
(tarsi). A recent study published
in the Journal of Experimental Biology suggested that minimal involvement of the fleas muscles is
used, and that the force transmitted comes from an elastic structure, attached to the base of the leg
and the body (Sutton, 2010). This
elastic structure is believed to be a
lump of resilin, which is a known

elastic protein in insects (Sutton,


2010). In this study, further observations concluded that the transfer
of energy from this elastic structure to the ground via the tarsi.
The study also did not find any
resilin in other portions of the leg
suggesting that the elastic energy
transfer initiated at the base of
the fleas legs (Sutton, 2010). Similar to the anatomical structure
of the human leg, the flea transmits much energy from close to
the legs attachment to the body to
its feet. Additionally, the flea has
spines projecting from its legs that
help to stabilize the flea for takeoff (Sutton, 2010). The kinetics of
their jump is when the hind legs

are locked and then released. The


stored energy provides an abrupt
expansion pushes the flea into a
jump.
In conclusion, the capability of
jumping and the proficiency involves force, anatomical structure
and the transfer of energy. The
capability of jumping in cats, humans and fleas revolves around
Newtons third law, impulse, energy, their anatomical structure,
work, and Hookes Law of elasticity. In the proficiency of jumping
by cats, humans and fleas, however
a common thread is that the mechanical physics is unchanged.

References
Davidovits, Paul. Physics in Biology and Medicine. (Cambridge: Elsevier Science, 2004), 31.
Cat Behavior Explained. Cat Anatomy. Available from http://www.cat-behavior-explained.com/catanatomy.html; Internet.
Harris, Michelle A. The relationship between maximum jumping performance and hind limb
morphology/physiology in domestic cats (Felis silvestris catus). The Journal of Experimental Biology
205, no. 24 (2002), 377-389.
Umberger, Brian R. Mechanics of the Verical Jump and Two-Joint Muscles: Implications for Training.
Strength and Conditioning 20, no. 5 (1998): 70.

Georgia State University, Elasticity, Periodic Motion. Available from http://hyperphysics.phy-astr.gsu.edu/hbase/permot2.


Internet.
Sutton, Gregory P. and Burrows, Malcolm. Biomechanics of jumping in the flea. The Journal of
Experimental Biology 214, no. 5 (2010): 836-847.

Differences Between Morphological and Proteomic-based Phylogenies of Sixteen Aquatic Species


by Leah Umanskiy

Introduction
The understanding of the relationship between extant species
and their ancestors changes every
day. Phylogeneticists use different
methods to compare similarities
and differences between species
to find the path of evolution.
Thousands of different phylogenetic trees are created to describe
the same group of species, each
representing a different evolutionary path. The study of morphological similarities and proteomic
similarities are common methods
used in phylogenetics. Because of
these varying methods and results,
genus and taxonomic names constantly change (Spaulding et al,
2009). In this study, it was found
that different methods produce different evolutionary clades. Previously, we studied the same group
of sixteen fish from a morphological standpoint. Phylogenies
were created based on morphological similarities and the fewest number of evolutionary steps necessary
to reach the chosen extant taxa.
To understand this group of taxa
thoroughly, it is important to review their evolution in multiple
ways. This paper will focus on creating a phylogeny using protein sequence similarities and differences
between morphological and proteomic phylogenies. It is essential to study proteomic similarities because natural selection affects proteins and an organisms in-

ternal functionality, which do not


always translate into morphological features. Proteins also allow
scientists to look further back into
history because, depending on the
chosen protein since most proteins
have been developing prior to vertebrate evolution (Shimamura et
al, 1997). Morphological features
may be rather recent additions to
an organism, which may inaccurately reflect the evolutions and
similar ancestry of the sixteen chosen taxa. Differences between the
two types of phylogenies will be
compared and discussed. Neither
method chosen for this study is superior, but the methods offer results, which lead to different evolutionary paths.

Materials
and Methods
Methods used to create two
phylogenies differ, but the overall
idea is similar. To create a morphological tree, a data matrix (Fig.
1) with sixteen characteristics was
completed using various sources
(Gingerich et al, 1990). Images
found on the Internet were used
as a primary sources because many
characteristics, such as paired appendages and fin type, are external. Scholarly sources were used
to explain internal characteristics,
including the difference between
air-filled swim bladder, jelly-filled
swim bladder, and lungs, and to
find groups of fish with these char-

acteristics. After the data matrix was completed, a morphological phylogeny was created based
on shared characteristics between
the sixteen taxa.
When creating a proteomicbased phylogenetic tree with the
same species, a data matrix (Fig.
2) was also created to represent
similarities between taxa. Similarities were found by comparing protein sequences. Cytochrome c oxidase subunit 1 protein sequences
were used for comparison because
the protein is found and has developed in all chosen species. Subunits of cytochrome c have been
successfully used in previous studies of molecular fish phylogenetics
(Arce et al, 2013). Cytochrome
c plays an important role in the
electron transport chain, an energy system that occurs in all living organisms. Cytochrome c subunit beta was chosen because it is
an oligomeric enzymatic complex
which is a component of the respiratory chain and is involved in
the transfer of electrons from cytochrome c to oxygen. The protein is not specialized in certain
species, so its sequence comparison between all species in this phylogeny shows an accurate representation of ancestors and similarities
(Garcia-Hornsman et al, 1994).
One source was used exclusively for obtaining the protein sequences. This database,
uniprot.org, strives to provide
the scientific community with a
5

comprehensive, high-quality and


freely accessible resource of protein sequence and functional information. UniProt is used by
other researchers and contains innumerable species and protein sequences. After protein sequences
were obtained, a computer program, LALIGN, was used to compare the sequences and find the
similarity percentage. Sequences
between every species were compared and recorded in the data
matrix. Percentages were rounded
to whole numbers to simplify the
process of creating a tree. Subsequently, a tree was created using
the information from the data matrix.

Results
Comparing phylogenetic trees
of the same species created by two

different methods had simultaneously surprising and expected results. Fish with the same genus
had the highest similarity in both
phylogenies. For example, Saddled
Bichir (Polypterus endlicherii) and
Cuviers Bichir (Polypterus ornatiannius) had the highest protein sequence similarities (95%)
and had the most similar morphological characteristics (Fig 2).
Dogfish (Squalus acanthias) and
Great White Shark (Carcharodon carcharias) had similar results as Bichirs (Polypterus) because both species are categorized
as sharks. Paddlefish (Polyodon
spathula) and Sturgeon (Acipenser
transmontanus) stayed in similar
positions in both phylogenies although their genuses are not the
same (Fig 3).

The Axolotl (Ambystoma mexicanum), not closely related to any


of the other chosen species, reflected much different ancestral relationships between the two phylogenies. Axolotl, on average, only
had 87% similarity to the remaining fourteen species.
Ropefish
(Erpetoichthys calabaricus) completely changed its position with
there being approximately 92%
similarity with Bichirs and only
an average of 89-90% similarity
between other species. On the
morphological tree, the Ropefish
is most similar to Axolotl and
Tiktaalik (a species not added to
the proteomic-based phylogeny).
Species mentioned in the results
had expected or surprising results.
Remaining species did not have
changes worth noting and can be
observed between the two trees.

Figure 1: Morphological Data Matrix. Green = Taxa has specific property.

Figure 2: Proteomic Data Matrix


7

Discussion
Creating two phylogenies with
different methods shows that the
chosen species have different evolutionary steps and ancestors depending on parts of the organism experiencing selective pressures. Characteristics chosen for
the morphological phylogeny are
used to increase fitness, reduce
the energy requirement for finding
food, or adapt to environments.
Conversely, the proteomic-based
phylogeny focused on one protein
and tracked its change and differences between taxa. Proteomic
methods using proteins found in
most cells do not allow readers
to understand selective pressures
taxa may experience or physical changes that occur in order
to conserve energy. For example, the evolution of lungs, as
could be seen on the morphological phylogeny, may indicate that
a specific taxon transitioned into
a semi-aquatic, semi-land dweller.
Such changes could not be observed on a proteomic-based phylogeny. Other difficulties were encountered when trying to compare
the two phylogenies. Morphological and proteomic-based phylogenetic methods cannot be used in-

terchangeably because the information required to complete both


types of the data matrices was
unavailable.
In order to complete proteomic-based phylogenies,
the chosen species must be extant
so that the proteins can be sequenced. The morphological tree
had an additional species, Tiktaalik (Tiktaalik roseae), a tetrapod fossil, because its morphological characteristics could be interpreted through its bone structure
(Gingerich et al, 1990). Tiktaalik
was not added to the proteomicbased tree because its protein sequences are unavailable (Spaulding
et al, 2009). Morphological trees
can be used universally with extant and extinct species because
the species does not have to be living to understand its characteristics. Proteomic-based phylogenies
are more scientifically accurate because it uses computer programs,
whereas morphological characteristics can be chosen by judgment.
Personal judgment of characteristics cannot be considered
accurate or universally accepted
information because each person may interpret the characteristic differently. For these reasons, it is important to continue
studying proteomic-based phylo-

genies although protein sequences


change from protein to protein because genes express different sequences, which can make the phylogenies inconsistent. To find the
most realistic phylogenetic tree of
the chosen taxa, it would be useful
to study multiple proteins in conjunction with morphological characteristics. Each protein or characteristic holds significance in each
taxons evolution, and to ignore an
evolutionary aspect would not be
useful in understanding the connection between species.
Using protein sequences to find
synapomorphies and other relationships between species helped
to advance phylogenetic studies
beyond morphological characteristics based on fish anatomy. When
comparing the two phylogenies of
sixteen taxa, there were significant differences with regard to the
clades, placement of the clades,
and similarities between species.
Neither phylogeny can be regard as
more accurate because both methods have inconsistencies. Research
with more proteins and characteristics will help create a better
understanding of the relationship
between the taxa chosen for this
study.

References
Arce, HM, RE Reis, et al. Molecular phylogeny of thorny catfishes (Siluriformes: Doradidae). Molecular
Phylogenetics and Evolution 67.3 (2013)
Garcia-Hornsman, J A. et al. The Super Family of Heme-Copper Respiratory Oxidases. Journal of
Bacteriology 176.18 (1994)
Gingerich, Philip, Elwyn Simons, et al. Hind Limbs of Eocene Basilosaurus: Evidence of Feet in Whales.
Science 249.4965 (1990): 154-157.

Shimamura, Mitsuru, and Hiroshi Yasue. Molecular evidence from retroposons that whales form a clade
within even-toed ungulates. Nature 388. (1997)
Spaulding, Michelle, Maureen OLeary, et al. Relationships of Cetacea (Artiodactyla) Among Mammals:
Increased Taxon Sampling Alters Interpretations of Key Fossils and Character Evolution. Plos One 4.9
(2009)

Changing Technology
editorial by Sam van Loon
Technology is changing on an
extremely rapid basis. Were approaching things like phone notifications in our glasses and virtual
reality at breakneck speed, with
young, savvy inventors at the forefront. One such person is Palmer
Luckey, founder of Oculus VR.
When Luckey reached his late
teenage years, he became obsessed
with VR. He bought all the latest gadgets as they came out, but
they always left him feeling unsatisfied ... and nauseous. With a regular video game, lag is annoying;
however, in VR games, that lag is
vomit-inducing. Imagine turning
your head to the side, only to have
your view remain static, and then
quickly go back to where its meant
to be. It messes up your perception to the point where you want
to vomit.
Luckey has progressed leaps
and bounds with the technology.
Instead of an LCD display, the
Oculus Rift uses twin AMOLED
displays, which are able to change

much quicker, shaving away critical milliseconds of lag.


The Oculus Rift also makes
VR affordable. Rather than using high-tech, precise lenses, Oculus uses a pair of cheap magnifying lenses. Developers distort
their graphics accordingly so the
games will look right when viewed
with the Rift. Because of this and
other cheap components, the current projected price for the Rift is
only a few hundred dollars.
The graphics processor on the
Rift takes 1,000 readings a second,
allowing it to pre-render images,
and the tracking camera allows the
user to approach in-game objects
as they would real-world ones. For
example, if there were a box in
front of you that you wanted to get
a closer look at, all youd have to do
would be walk towards it to get a
closer look. The pre-rendering of
images also takes away that much
more lag.
While the Rift may seem like
something that can only be used

by gamers, there are many practical applications, some of which we


have not even thought of yet. The
most obvious, of course, is video
chatting. Why have a person on
the screen in front of you when you
can interact with their avatar in
your living room? There is also
the possibility of remote surgery.
A surgeon could don the Rift, pick
up virtual tools, and have a precise
machine in the surgery room do all
the same things they do wherever
they are.
There are many more innovations than just the Rift. VR is the
product of a long series of innovations, beginning with things that
we can only see in museums and
history books. While these innovations will definitely increase, their
direction is rather uncertain. We
may even be approaching an end
to silicon chips; theres a limit to
how fast a silicon machine can go
and still be sufficiently small for a
user.

10

Infusion of Young Blood May Be the Answer to Saving Old Brains


by Dhanya Asokumar
A new innovative study published in Nature shows that the
infusion of young blood recharges
brains of old mice. This is an
important discovery that applies
to a wide-range of human memory and aging.
Saul Villedas
study shows that the plasma from
young mice infused in elderly mice
reverses some age-related impairments in cognitive functions in old
mice. This study tracked significant molecular, anatomical and
physiological changes in the brain
tissues of old mice that shared the
blood of young mice (Wyss-Coray,
2014).
The scientists compared the
mental performance in spatial
memory tests of the plasmainfused elderly mice to that of
young as well as elderly mice with
no plasma infusion (Villeda, 2014).
Both the experimental and control
groups of mice underwent spatialnavigation tasks such as being able
to recall safe locations to rest from
a water-filled chambers. This experimental procedure called the
Morris Water Maze (MWM) is
the most accepted method of tests
used by scientists to study the effects of aging on rodents. The use
of cues in the surrounding is key to
finding this platform, on which to
rest, which engages the hippocampus of the brain. Young mice are

typically more successful in finding


this platform, whereas aging mice
struggle with this task. However,
in this experiment, the old mice
infused intravenously with plasma
from young mice performed better than the control groups. The
blood vessels in their brains, which
had shortened with age, increased
in length after the intravenous injection. Changes were also found
in the hippocampus of the brain,
which is also particularly vulnerable to aging.
Exposure to young blood
through coupling of blood systems of old and young mice also
improved stem cell functions in
muscle, liver, spinal chord, and
brain and ameliorated cardiac hypertrophy (Villeda, 2014). Also,
the experiment showed that some
age-related impairments in brain
functions were reversible. Key regions in the brains of old mice
exposed to blood from young mice
produced more nerve cells than
did the brains of old mice similarly exposed to blood from old
mice. On the contrary, scientists
found that exposing young mice
to blood from old mice had the
opposite effect. Substances in the
old blood inhibited new nerve-cell
production, and also reduced the
young mices ability to navigate
in the Morris Water Maze. This

earlier work did not examine the


impact of young mouse blood on
the behavior of elderly.
This study is important for
treating important diseases in humans. It is found that if the same
goes for humans, it could set a
new paradigm for recharging elders aging brains, and it might
mean new therapeutic approaches
for treating dementias, such as
Alzheimers disease (Villeda, 2014).
The nations older population is
projected to reach 83.7 million
in the year 2050 (Villeda, 2014).
This is almost twice the population of 43.1 million in 2012 according to the U.S. Census Bureau (Villeda, 2014).
This is
an important consideration as aging is linked to cognitive impairments and vulnerability to degenerative disorders in healthy individuals by structurally and functionally changing the adult brain.
Considering the increase in the
proportion of elderly humans, this
research is important to identify
a way for mitigating cognitive impairment as a result of the aging
process. Clinical trials are currently being planned to test the
impact of plasma infusion from
young humans into aged patients
affected by Alzheimers disease.

References
Wyss-Coray, Tony. Infusion of young blood recharges brains of old mice, study finds. Available from:
http://med.stanford.edu/ism/2014/may/young-blood.html. Internet; accessed 1 June 2014.

11

Villeda, Saul A. et al. Young blood reverse age-related impairments in cognitive function and synaptic
plasticity in mice.Nature, no. 20 (2014), 659-663.
Terry, Jr., Alvin V. Spatial Navigation (Water Maze) Tasks. In J. Buccafusco (Ed.), Method of Behavior
Analysis in Neuroscience 2nd edition . Boca Raton, Florida. CRC Press, 2009.

12

CoffeeGVR: Versioned Data Structures


for Resilient Algorithms in Java
by Walker Melton

Abstract

The Global View of Resilience (or GVR) project at


the Chien lab at the University of
Chicago is researching the efficient
handling of data errors through the
creation of versioned arrays and
resilient algorithms in the C pro-

gramming language.2 My research


focused on writing versioned data
structures and resilient algorithms
in Java, a higher-level objectoriented programming language. I
created useful versioned arrays and
trees and an immediate- and silent-

fault tolerant mergesort, but the


overhead3 is currently too high to
be useful at typical real-world error
rates. We plan on continuing work
on the project to reduce this overhead to make CoffeeGVR a useful
tool for everyday applications.

Introduction

While computers are often seen


as infallible, they actually have a
high rate of error due to the sheer
number of hardware components
and the incredible speeds at which
those components run. Most of
these errors are caught by error detection and correction mechanisms
in the processor. Of those errors
that slip past, most are caught by
the operating system. Nevertheless, these hardware- and systemlevel error-correction mechanisms
are often ineffective or inefficient
and errors reach user-level programs. The GVR team estimates
that the Mean Time Between Errors on a supercomputer could
even drop below an hour due to its
high complexity and large size.[1]

data structures, ranging from the


primitive array to red-black binary
trees. The structure of the data
can provide benefits in organizing,
retrieving, and modifying the data.
For example, an array can provide constant-time access to data
stored within it. A heap4 can remove the smallest value in logarithmic time. These data structures are often central to many al5
gorithms. For
 example, heapsort,
a O n log n sorting algorithm, inserts all the elements into a heap
The GVR group in the Chien
and then removes them one by one
lab lab is developing frameworks
and appends them to a sorted list.
and programs in C to improve the
Because each removal is O log n
development of fault-tolerant algoand it needs to removen elements,
rithms. Versioned data structures
heap sort is O n log n because of
are one such tool.
its underlying data structure, the
Information is often stored in
Such errors can cause crashes, incorrect results, or corrupted data.
As computers become larger and
faster (especially supercomputers
performing mission-critical operations) and control ever more important technology such as cars
and airplane traffic control, it
will become increasingly important for programs themselves to
integrate error-handling elements
rather than rely on lower level
tools.

2 I would like to thank Dr. Andrew Chien, the William Eckhardt Professor in the Department of Computer Science, a Senior
Fellow in the Computation Institute, and a Senior Computer Scientist at Argonne National Laboratory, and Dr. Nan Dun
(postdoctoral scholar) for providing me the opportunity to work on this project and for all of their help along the way.
3 Resources consumed over and above those consumed by the normal data structure or algorithm.
4 A special case of a binary tree in which the two children of a node are always greater than the node.

5 A function f (x) is said to be O g(x) if there exists real numbers M and x such that |f (x)| M |g(x)| when x x .
0
0
In this case, f (x) is a function describing the number of steps the algorithm takes as a function of the size of the input. It
essentially means that f (x) grows at the same or lesser rate as g(x).

13

heap.
Data structures provide a rich
environment for creating error
handling tools because they are so
ubiquitous and useful. The GVR
group focuses on creating ver-

sioned data structures that preThe Chien lab has written a


serve the state of computation at versioned array in C. I have implea certain time. Versioning the mented and extended this system
structure ensures that you can al- in Java.
ways access the data at the time
you called version.

Implementation
or groups of related classes. The shown in 3 below:7
core data structures (GDSArray
and GDSTree)6 are in subpackages of the main gvr package, as

CoffeeGVR is a Java-based
set of versioned data structures
and fault-tolerant algorithms. It is
organized into several packages,

gvr
array

tree
<<interface>>
Versioned

dictionary
GDSDictionary

GDSTree

GDSArray

GDSArrayList

error injection
GDSArrayInteger

Figure 3: The gvr package

3.1

Versioned

All of the versioned data structures implement the Versioned in-

terface, which declares methods related to versioning, as diagrammed


in 4 below:8

6 The classes are prefixed with GDS, standing for Global Domain System to be consistent with the GVR groups naming
conventions.
7 Unified Modeling Language, or UML, is a type of diagram often used to depict the relationship between classes. Each blue
box represents a package, a collection of related classes. Each yellow box is a class. These diagrams were created with the
LATEX package tikz-uml.
8 This UML diagram represents an interface, a collection of unimplemented functions. When another class implements an
interface, it is required to implement each of those functions. Each of the items in the lowest box is a function any implementing
classes must implement.

14

<<interface>>
Versioned
+
+
+
+
+

version inc() : void


move to next() : void
move to prev() : void
getNumVersions() : int
getCurrVersion() : int

Figure 4: The Versioned interface


While some might add put
and get methods to the Versioned
interface because it is only being used for data structures, these
methods would limit the use of the
interface to data structures. CoffeeGVR leaves the Versioned interface as generic as possible so that it
can be used for the greatest number of applications.

3.2

Array

The package gvr.array contains the classes GDSArray and


GDSArrayList and the package
gvr.array.error injection, which is
used for creating artificial errors to
test the resilient algorithms.
3.2.1

GDSArray

GDSArray is the basic versioned array. It behaves like an actual array with a fixed size except

that it can be versioned.


Internally, GDSArray is implemented as a doubly-linked list of
versions. Each version represents
the difference between that version
and the one before it. It also has
pointers to the next and previous
versions. GDSArray maintains a
reference to the current version,
called currentVersion, which is
updated on version creation and
navigation. 5 below diagrams the
relationship between Version and
GDSArray:9

9 This describes the relationship between Versioned, Version, and GDSArray. Each line indicates a relationship between
two classes. The dotted line ending in a white triangle represents realization, or inheritance. In this diagram, it indicates
that GDSArray implements the Versioned interface. The solid lines ending in a white diamond represent aggregation, or a
weak has-a relationship. In this diagram, the aggregation means that Version has 2 Versions, but their life cycle may not
be dependent on that Version. These are last and next. The solid lines ending in a black diamond indicate composition, a
stronger form of a has-a relationship in which the life cycle of the owned object is dependent on the lifecycle of the owner.

15

gvr
<<interface>>
Versioned
+
+
+
+
+

version inc() : void


move to prev() : void
move to prev() : void
getNumVersions() : int
getCurrVersion() : int

array
T
GDSArray

T
-

Version
-

changes : HashMap<Integer, T>


last : Version
next : Version
versionNumber : int
size : int
2

+
+
+
+
+

Version(versionNumber : int, last : Version, size : int)


get(index : int) : T
put(index : int, value : T)
remove(index : int)
setNext(next : Version)

numVersions : int
currVersion : int
currentVersion : Version
firstVersion : Version
guard : HashMap<Integer, T>

+ GDSArray(size : int, default : T)


+ GDSArray(size : int)
+ GDSArray(elements : T[], default : T)
+ size() : int
+ get(index : int) : T
+ get(index : int, age : int) : T
+ put(index : int, new : T)
+ remove(index : int)
+ version inc()
+ move to prev() : boolean
+ move to next() : boolean
rawGet(index : int) : T

Figure 5: GDSArray
To get an element, GDSArray
first looks in the current version
to see if it is in the current version. If it is in the current version,
it returns that value. Otherwise,
it looks in the version before that,
and then the one before that, and

so on, until it finds the value or


runs out of versions.

ray memoizes the results of gets


and sets.
Memoization is an
operation which stashes the reDespite its simplicity, this get
 sult of a method. If that value
routine is inefficient: it is O P , is requested again, it returns the
where P is the number of versions. memoized value rather than getTo avoid computing the expen- ting it from the array again. Besive get operation often, GDSAr16

fore executing get as described steps to get the value at index 4:


4. Observe that index 4 isnt
above, GDSArray first checks if
there either, so it looks in the
1. Check to see if the value is
the value at index is already memprevious version; and
memoized and see that it is
oized. If it is, it returns that
not;
value. Otherwise, it executes the
raw get and memoizes the value.
5. Finds the value in version 1
2. Look in currentVersion for
GDSArray also memoizes on set.
and returns it to the calling
a mapping from index 4 to
move to next() updates the memfunction.
some value;
oization, while move to prev() currently resets the elements changed
3. Observe that it is not there,
in the current version. As an illusand continue looking in the Figure 6 depicts the configuration
tration, the GDSArray depicted in
version before the current of the GDSArray used in the prior
6 would go through the following
version;
example.

current version
v1

v2

v3

1:Four
4:Seven
5:Six

1:Five
2:Six
3:Seven
5:Eight

3:Four
7:Eight

6:Three

8:Nine
9:One
0:Zero

Figure 6: A diagram of a working GDSArray. Each box is a version, and the lists are hash maps which map
the first element to the second.
Iteration

algorithms use a more mutable 3.2.3 error injection


size, so a form of GDSArray with
GDSArray also allows the creAnother goal of CoffeeGVR is
that feature was spun off into
ation of iterators at any depth.
to show that these versioned data
GDSArrayList.
The iterator() method returns an
structures are useful for error haniterator at depth 0, while the iterdling, so we needed a way to create
Because it does not use memo- artificial errors. GDSArrayInteger
ator(n) method returns an iterator
ization, the get of GDSArrayList has an internal GDSArray of intethat looks at depth n.
is not as efficient as the get of gers and simulates a bit-flip error
GDSArray. One should use it only for testing purposes when the user
3.2.2 GDSArrayList
when the need for a mutable size calls injectError(), as shown in
Because
of
memoization, trumps the need for the speed benGDSArray has a fixed size. Many efit of memoization.

17

7:
array
T
GDSArray
-

error injection
GDSArrayInteger

array : GDSArray<Integer>
hasError : boolean
errorMessage : String
errorIndices : int[]

+
+
+
+
+

all of GDSArray
resetError()
hasError() : boolean
errorMessage() : String
injectError()
createError()

numVersions : int
currVersion : int
currentVersion : Version
firstVersion : Version
guard : HashMap<Integer, T>

+ GDSArray(size : int, default : T)


+ GDSArray(size : int)
+ GDSArray(elements : T[], default : T)
+ size() : int
+ get(index : int) : T
+ get(index : int, age : int) : T
+ put(index : int, new : T)
+ remove(index : int)
+ version inn c()
+ move to prev() : boolean
+ move to next() : boolean
rawGet(index : int) : T

Figure 7: GDSArrayInteger
The use of GDSArrayInteger 3.4 Tree
to inject error into resilient algoA binary tree is a data strucrithms is detailed in section 4.2 beture
consisting of nodes and leafs.
low.
Each node has two children, each
of which could be a leaf or another
node. For example, the following
is a binary tree:10
A
3.3 Dictionary

ture than an array list, and this


structure will have to be maintained by the versioned tree, a
more complicated task than maintaining the relatively simple structure of GDSArray.

One way to maintain the structure of different versions would be


to associate a version with each
node and have the left and right
C
B
A dictionary maps keys to valpointers be lists of pointers, with
D E F G
ues. Essentially, it is like an array
the stipulation that no two nodes
except it can have any kind of inBecause binary trees are in- in the list have the same version.
dex. GDSDictionary uses the same volved in algorithms ranging from The software could then find the
internal structure, but parameter- searching to sorting, a versioned version it is looking for by findizes the key type as well as the tree would be very useful. How- ing the node with the highest vervalue type.
ever, a tree preserves more struc- sion no greater than the one it
10 This

image was created using the LATEX package qtree.

18

seeks. The versioned nodes can


have pointers to the same nodes,
so no downstream duplication is
needed to maintain the structure.
However, this setup has some
problems. First, it does not provide an easy way to version the
root because there are no pointers to it. This is simply worked
around, however:
INVISIBLE
A
B

C D F G
While this is a rather ungainly
solution, it would work. A more
important problem is the over-

head. Because objects in Java have


overhead, it is best to use no more
objects than we need. However,
this model uses more objects than
necessary. If the overhead for an
object is and the number of versions is N , then the overhead is
2N + because each versions has
an object and its own overhead because it is itself an object.
These problems warrant a better solution. I chose to have a normal tree with the structure being
the union of the structures of all
versions. Each node maintains a
linked list of objects corresponding to its versions. This reduces
the overhead to N + per node
and allows simple versioning of the
root node.

3.4.1

VersionedNode

VersionedNode provides the


backbone of GDSTree. It stores
the versions and maintains pointers to its left and right trees. It
does not, however, maintain the
current version. Instead, it requires that a version number be
provided in every call to get, put,
and remove. GDSTree has a VersionedNode and maintains the version as an integer.
Figure 8 displays the relationship between VersionedNode,
Path, and Version. Version holds a
value of type T or nothing and its
version number. If it holds nothing, it indicates that there is no
node there at that version:

19

<<interface>>
Path
+
+
+
+

getDirection() : boolean
rest() : Path
advance()
length() : int

T
VersionedNode
- root: Version<T>
+ hasLeft(vn : int) : boolean
+ hasRight(vn : int) : boolean
+ getThrow(path : Path, vn : int) : T
+ getThrow(path : String, vn : int) : T
+ get(path : Path, vn : int) : T
+ get(path : String, vn : int) : T
+ putThrow(path : Path, newVal : T, vn : int)
+ putThrow(path : String, newVal : T, vn : int)
+ put(path : Path, newVal : T, vn : int)
+ put(path : String, newVal : T, vn : int)
getHereVal(vn : int) : Version<T>
getRightVal(vn : int) : Version<T>
getLeftVal(vn : int) : Version<T>

T
Version
+ isNothing : boolean
+ value : T
+ versionNumber : int
+ Version(value : T, vn : int)
+ Version(vn : int)

Figure 8: VersionedNode
Getting from
a VersionedNode

mentation (they can, of course).


StringPath uses strings composed
solely of Ls and Rs to represent
Versioned node handles get the path. L means left, and R
with a function called getThrow means right. An empty string inbecause it throws NullNodeExcep- dicates the current node.
tions and InvalidPathExceptions.
Because binary trees are natgetThrow accepts a Path, an ob- urally recursive, getThrow is as
ject which specifies the path to a well. The base case of the recurnode. Path.getDirection() returns sion is when path is empty (i.e.
true if the path leads to the right path.length() == 0). This inand false otherwise.
dicates that the calling function
VersionedNode defines StringPath, a class implementing Path,
to save the user the trouble of
defining their own Path imple-

wants to get a value from the


current node, and simply returns
getHereValue(). If the path is
nonempty, it first calls getDirec-

tion() on path to determine the


direction. If the desired node is
empty in the current version, it
throws an InvalidPathException to
indicate that the path does not
lead to an existing node. get simply calls getThrow but handles InvalidPathExceptions by returning
null.

The forms of get take an integer vn because VersionedNode


does not maintain its own version.
It uses the parameter to determine
what value to return.
20

Putting into a VersionedNode tion.


The put method adds an element to the current structure
or one node beyond the current
structure through the putThrow
method, which implements the
basic logic and throws InvalidPathExceptions as needed.
The putThrow method is, naturally, recursive. The base case
is again when path.length() == 0,
in which case it sets the value of
the current node at the specified
version number to newVal. Otherwise, it checks which direction it
should move. If that direction does
not lead to an existing node, it
checks that the length of the path
is one because the user is only allowed to add values one beyond the
current structure. It then adds the
node. If there is a value, it recursively calls putThrow(path.rest(),
newVal, vn) on the specified direc-

Removing a node from a VersionedNode

move to next() by simply manipulating the current version number


and the number of versions.

The remove method simply re- 4


Results
moves a node from the structure by
recursively traveling to the spec- 4.1 Performance
of
ified node and version number,
GDSArray
setting that nodes value at the
After GDSArray was written,
correct version number to nothI
performed
some simple perforing, and recursively removing the
mance
and
memory
usage statisnodes children, its childrens chiltics.
First,
I
measured
perfordren, and so on.
mance by measuring how long it
took to create n versions with some
3.4.2 GDSTree
m% changed every version. I also
GDSTree has a VersionedNode tracked the memory usage as the
and maintains the version as number of version increased and
an integer.
All calls to get, m changed. As shown below, the
put, and remove are passed to time and memory increase dramatthe VersionedNode with the cor- ically as the depth of the GDSArrect version added as a pa- ray increases. The performance
rameter.
It implements ver- data generates is displayed in Figsion inc(), move to prev(), and ures 7 and 8:11

Time (s)

106
105
104

10%
20%
40%
80%

103
100

101
102
103
104
Depth of GDSArray

Figure 9: Get time in GDArray

105

Memory Usage (Bytes)

107

108

107

10%
20%
40%
80%

106
100

101
102
103
104
Depth of GDSArray

105

Figure 10: Memory Usage of GDArray

11 The data was taken on a mid-2011 MacBook Air with a 1.6 Ghz dual-core Intel i5, 256 KB of L2 Cache per core, and 3
MB of L3 cache.

21

I also measured the performance of get as the number of


versions increased.
The graph
shown below was generated by
measuring the time required to
retrieve an element from the ar-

ray.
Times were taken using
System.nanoTime() and averaged
10,000 times to obtain more accurate results. The data seem
fairly linear. However, it clearly
bends upwards when the depth is

around 400 versions, possibly due


to using RAM rather than cache to
store the versions. The results are
shown if Figure 9 below:

Time (ns)

6,000

4,000

2,000

0
0

200

400

600

800

1,000

Depth of GDSArray
Figure 11: Performance of get in GDArray

4.2

Immediate
Error two lists. Because my project was tween the virtual lists.
When an error is reported, reRecovery in Merge- not concerned with detecting errors,
GDSArrayInteger
alerts
the
silient
mergesort first calculates
sort

Another goal of CoffeeGVR


was to write resilient algorithms
using the versioned data structures
described above. Therefore, a version of mergesort12 that handles
immediate errors was required.
Injecting errors was performed
by using GDSArrayInteger rather
than GDSArray and calling injectError with a certain probability when the algorithm combines

user when an error has occurred.


The error handling mergesort
uses versions to maintain all of
the necessary information. Rather
than recursively split the list, it
creates virtual lists of size one
and then combines them them into
the next higher version. This prevents data from being lost. The
virtual separations are maintained
with a GDSArray called sepList,
which stores the separations be-

the virtual list that holds the error. It then recomputes that part
of the array from the two lists that
were combined to form that list.
For example, in Figure 12, an error
occurred at location 1 in version 3.
Mergesort recognized that this error is in the subarray between locations 0 and location 4 (non inclusive on the upper end), and recomputed the section shown by the
red box to fix the error:

12 Mergesort


is a O n log n sorting algorithm that recursively splits the list until its length is one element long and then
combining them to form the sorted list. A short algorithm for mergesort is: mergesort(a). 1. If a.length() == 1, return a. 2.
Otherwise, split a into left- and right- sublist (l and r) and let sorted l = mergesort(l) and sorted r = mergesort(r). 3. Combine
sorter r and sorted l and return the result.

22

v4
v3

error

v2
v1

Figure 12: Resilient Mergesort. The blue lines represent the separations between the virtual lists

4.3

Silent Error Recov- mediate error handling mergesort. 4.4


The major differences are that it
ery in Mergesort

I also wrote a version of mergesort that handles a single silent error at a time. Silent errors are errors that have a delay between the
time they occur and the time they
are reported.
Errors were again injected
through combine, but the code
prevents the algorithm from handling the error for some length of
time.
The silent error recovery uses
much of the same code as the im13 The

fixes the code to recompute part


of the data and a function that decides how deeply to recompute the
data.
The function findDepth finds
the correct depth to recompute the
data by counting the occurrence of
the elements. Because the silent
error will always change one of the
values, the count will be different
before and after the error occurred.
FindDepth reports the depth of
the first version where the counts
change.

Performance of Resilient Mergesort

When I computed performance


statistics, I compared the performance of my immediate error handling mergesort to a naive implementation which simply restarts
each time it detects an error. The
naive mergesort is faster at realistic error rates. However, my faulttolerant mergesort provides dramatic performance benefits at high
error rates, such as 0.8.13 The data
are shown in Figure 11 below:

error rate is the chance that an error is injected each time the algorithm combines two lists.

23

1012
Naive
Versioned

1011

Time (ns)

1010
109
108
107
106
105

0.2

0.4

0.6

0.8

Error Rate
Figure 13: Naive and Versioned Mergesort
Unfortunately, resilient merge- cept at unreasonably high error faster than a naive mergesort.
sort is currently not more effi- rates. At error rates above 0.2,
cient than a naive algorithm ex- resilient mergesort is significantly

Conclusion

Versioning can clearly provide


an effective tool for writing resilient algorithms: by preserving
information about the historical
state of computation, it is simple to handle errors by restarting the computation from some

point in time before the error occurred. It also enables more sophisticated error-handling methods, such as that used in the faulttolerant mergesort explained previously.
Despite its advantages, Cof-

feeGVR in its current state has too


much overhead to be much use for
anything except at very high error
rates. I plan to work on reducing
the overhead to make CoffeeGVR
a useful tool for creating resilient
programs in Java.

References
[1] Guoming Lu, Ziming Zheng, and Andrew A. Chien, When are Multiple Checkpoints Needed?, in 3rd
Workshop for Fault-tolerance at Extreme Scale (FTXS), at IEEE Conference on High Performance Distributed Computing, June 2013, New York, New York.

24

También podría gustarte