Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Vivek Kwatra Arno Schödl Irfan Essa Greg Turk Aaron Bobick
This banner was generated by merging the source images in Figure 6 using our interactive texture merging technique.
Step 4 Step 5
Existing old Synthesized Texture
Pixels cut Seam Costs
A (After 5 steps of Refinement)
Existing New
new
cut Pixels Patch Figure 5: This figure illustrates the process of synthesizing a larger
A B texture from an example input texture. Once the texture is initial-
ized, we find new patch locations appropriately so as to refine the
New Patch B texture. Note the irregular patches and seams. Seam error measures
that are used to guide the patch selection process are shown. This
process is also shown in the video.
7 Video Synthesis
B. Interactive Merging and Blending: One application of the One of the main strengths of the graph cut technique proposed here
graph cut technique is interactive image synthesis along the lines of is that it allows for a straightforward extension to video synthesis.
Consider a video sequence as a 3D collection of voxels, where one
[Mortensen and Barrett 1995; Brooks and Dodgson 2002]. In this
application, we pick a set of source images and combine them to of the axes is time. Patches in the case of video are then the whole
form a single output image. As explained in the section on patch 3D space-time blocks of video, which can be placed anywhere in
the 3D (space-time) volume. Hence, the same two steps from im-
fitting for texture synthesis (Section 3), if we constrain some pixels
of the output image to come from a particular patch, then the graph age texture synthesis, patch placement and seam finding, are also
cut algorithm finds the best seam that goes through the remaining needed for video texture synthesis.
unconstrained pixels. The patches in this case are the source im- Similar to 2D texture, the patch selection method for video
ages that we want to combine. For merging two such images, the must be chosen based on the type of video. Some video sequences
user first specifies the locations of the source images in the output just show temporal stationarity whereas others show stationarity in
and establishes the constrained pixels interactively. The graph cut space as well as time. For the ones showing only temporal station-
algorithm then finds the best seam between the images. arity, searching for patch translations in all three dimensions (space
and time) is unnecessary. We can restrict our search just to patch
This is a powerful way to combine images that are not similar offsets in time, i.e., we just look for temporal translations of the
to each other since the graph cut algorithm finds any regions in patch. However, for videos that are spatially and temporally sta-
the images that are similar and tries to make the seam go through tionary, we do search in all three dimensions.
those regions. Note that our cost function as defined in Equation (5) We now describe some of our video synthesis results. We start by
also favors the seam to go through edges. Our results indicate that showing some examples of temporally stationary textures in which
both kinds of seams are present in the output images synthesized we find spatio-temporal seams for video transitions. These results
in this fashion. In the examples in Figure 11, one can see that the improve upon video textures [Schödl et al. 2000] and compare fa-
seam goes through (a) the middle of the water which is the region of vorably against dynamic textures [Soatto et al. 2001]. Then we dis-
similarity between the source images, and (b) around the silhouettes cuss spatio-temporally stationary type of video synthesis that im-
of the people sitting in the raft which is a high gradient region. proves upon [Wei and Levoy 2000; Bar-Joseph et al. 2001]. All
The SIGGRAPH banner on the title page of this paper was gen- videos are available off our web page and/or included in the DVD.
erated by combining flowers and leaves interactively: the user had
to place a flower image over the leaves background and constrain A. Finding Seams for Video Transitions: Video tex-
some pixels of the output to come from within a flower. The graph tures [Schödl et al. 2000] turn existing video into an infinitely play-
cut algorithm was then used to compute the appropriate seam be- ing form by finding smooth transitions from one part of the video to
tween the flower and the leaves automatically. Each letter of the another. These transitions are then used to infinitely loop the input
word SIGGRAPH was synthesized individually and then these letters video. This approach works only if a pair of similar-looking frames
were combined, again using graph cuts, to form the final banner – can be found. Many natural processes like fluids and small-scale
the letter G was synthesized only once, and repeated. Approximate motion are too chaotic for any frame to reoccur. To ease visual dis-
interaction time for each letter was in the range of 5-10 minutes. continuities due to frame mismatches, video textures used blending
The source images for this example are in Figure 6. and morphing techniques. Unfortunately, a blend between transi-
It is worthwhile to mention related work on Intelligent Scissors tions introduces an irritating blur. Morphing also does not work
by Mortensen and Barrett [1995] in this context. They follow a two- well for chaotic motions because it is hard to find corresponding
step procedure of segmentation followed by composition to achieve features. Our seam optimization allows for a more sophisticated ap-
similar effects. However, in our work, we don’t segment the objects proach: the two parts of the video interfacing at a transition, repre-
explicitly; instead we leave it to the cost function to choose between sented by two 3D spatio-temporal texture patches, can be spliced to-
object boundaries and perceptually similar regions for the seam to gether by computing the optimal seam between the two 3D patches.
go through. Also, the cost function used by them is different than The seam in this case is actually a 2D surface that sits in 3D (Fig-
ours. ure 7).
of OCEAN, the result is overall good, but the small amount of avail-
Computed able input footage causes undesired repetitions. SMOKE is a failure
Seam of this method. There is no continuous part in this sequence that
Input Video
tiles well in time. Parts of the image appear almost static. The pri-
Similar Frames mary reason for this is the existence of a dominant direction of mo-
Window in
tion in the sequence, which is very hard to capture using temporal
which seam
translations alone. Next, we discuss how to deal with such textures
computed using spatio-temporal offsets.
Input Video
C. Spatio-Temporally Stationary Videos: For videos that
show spatio-temporal stationarity (like OCEAN and SMOKE), only
considering translations in time does not produce good results.
This is because, for such sequences, there usually is some dom-
inant direction of motion for most of the spatial elements, that
Output Video cannot be captured by just copying pixels from different tempo-
ral locations; we need to move the pixels around in both space and
time. We apply the sub-patch matching algorithm in 3D for spatio-
temporal textures. Using such translations in space and time for
spatio-temporal textures shows a remarkable improvement over us-
ing temporal translations alone.
The OCEAN sequence works very well with this approach, and
the motion of the waves is quite natural. However, there are still
Shown in 3D slight problems with sea grass appearing and disappearing on the
surface of the water. Even SMOKE shows a remarkable improve-
ment. It is no longer static, as was the case with the previous
Figure 7: Illustration of seams for temporal texture synthesis. method, showing the power of using spatio-temporal translations.
Seams shown in 2D and 3D for the case of video transitions. Note RIVER , FLAME, and WATERFALL B also show very good results
that the seam is a surface in case of video. with this technique.
We have compared our results for FIRE, OCEAN, and SMOKE
with those of Wei and Levoy [2000] – we borrowed these sequences
To find the best relative offset of the spatio-temporal texture and their results from Wei and Levoy’s web site. They are able to
patches, we first find a good transition by pair-wise image com- capture the local statistics of these temporal textures quite well but
parison as described in [Schödl et al. 2000]. We then compute an fail to reproduce their global structure. Our results show an im-
optimal seam for a limited number of good transitions within a win- provement over them by being able to reproduce both the local and
dow around the transition. The result is equivalent to determining the global structure of the phenomena.
the time of the transition on a per-pixel basis rather than finding a
single transition time for the whole image. The resulting seam can
then be repeated to form a video loop as shown in Figure 7. D. Temporal Constraints for Video One of the advantages of
We have generated several (infinitely long) videos using this ap- constraining new patch locations for video to temporal translations
proach. For each sequence, we compute the optimal seam within a is that even though we find a spatio-temporal seam within a window
60-frame spatio-temporal window centered around the best transi- of a few frames, the frames outside that window stay as is. This
tion. Examples include WATERFALL A, GRASS, POND, FOUNTAIN, allows us to loop the video infinitely (see Figure 7). When we allow
and BEACH. WATERFALL A and GRASS have been borrowed from the translations to be in both space and time, this property is lost and
Schödl et al. [2000]. Their results on these sequences look inter- it is non-trivial to make the video loop. It turns out, however, that we
mittently blurred during the transition. Using our technique, we are can use the graph cut algorithm to perform constrained synthesis (as
able to generate sequences without any perceivable artifacts around in the case of interactive image merging) and therefore looping. We
the transitions, which eliminates the need for any blurring. We have fix the first k and last k frames of the output sequence to be the same
also applied (our implementation of) dynamic textures [Soatto et al. k frames of the input (k = 10 in our implementation). The pixels in
2001] to WATERFALL A, the result of which is much blurrier than these frames are now constrained to stay the same. This is ensured
our result. The BEACH example shows the limitations of our ap- by adding links of infinite cost between these pixels and the patches
proach. Although the input sequence is rather long – 1421 frames they are constrained to copy from, during graph construction. The
– even the most similar frame pair does not allow a smooth transi- graph cut algorithm then computes the best possible seams given
tion. During the transition, a wave gradually disappears. Most dis- that these pixels don’t change. Once the output has been generated,
concertingly, parts of the wave vanish from bottom to top, defying we remove the first k frames from it. This ensures a video loop since
the usual dynamics of waves. the kth frame of the output is the same as its last frame before this
removal operation.
Using this technique, we have been able to generate looped se-
B. Random Temporal Offsets For very short sequences of
quences for almost all of our examples. One such case is WATER -
video, looping causes very noticeable periodicity. In this case, we
FALL B, which was borrowed from [Bar-Joseph et al. 2001]. We
can synthesize a video by applying a series of input texture patches,
are able to generate an infinitely long sequence for this example,
which are randomly displaced in time. The seam is computed within
where as [Bar-Joseph et al. 2001] can extend it to a finite length
the whole spatio-temporal volume of the input texture.
only. SMOKE is one example for which looping does not work very
We have applied this approach to FIRE, SPARKLE, OCEAN, and
well.
SMOKE . The result for FIRE works relatively well and, thanks to
random input patch displacements, is less repetitive than the com-
parable looped video. The SPARKLE result is also very nice, al- E. Spatial Extensions for Video We can also increase the
though electric sparks sometimes detach from the ball. In the case frame-size of the video sequence if we allow the patch translations
to be in both space and time. We have been able to do so success- B ROOKS , S., AND D ODGSON , N. A. 2002. Self-similarity based tex-
fully for video sequences exhibiting spatio-temporal stationarity. ture editing. ACM Transactions on Graphics (Proceedings of ACM SIG-
For example, the spatial resolution of the RIVER sequence was in- GRAPH 2002) 21, 3 (July), 653–656.
creased from 170×116 to 210×160. By using temporal constraints, B URT, P. J., AND A DELSON , E. H. 1983. A multiresolution spline with
as explained in the previous paragraph, we were even able to loop application to image mosaics. ACM Transactions on Graphics 2, 4, 217–
this enlarged video sequence. 236.
The running times for our video synthesis results ranged from C ROW, F. C. 1984. Summed-area tables for texture mapping. In Proceed-
5 minutes to 1 hour depending on the size of video and the search ings of the 11th annual conference on Computer graphics and interactive
method employed – searching for purely temporal offsets is faster techniques, 207–212. ISBN 0-89791-138-5.
than that for spatio-temporal ones. The use of FFT-based accelera-
D E B ONET, J. S. 1997. Multiresolution sampling procedure for analysis and
tion in our search algorithms was a huge factor in improving effi-
synthesis of texture images. Proceedings of SIGGRAPH 97 (August),
ciency. 361–368. ISBN 0-89791-896-7. Held in Los Angeles, California.
E FROS , A. A., AND F REEMAN , W. T. 2001. Image quilting for texture
8 Summary synthesis and transfer. Proceedings of SIGGRAPH 2001 (August), 341–
346. ISBN 1-58113-292-1.
We have demonstrated a new algorithm for image and video synthe- E FROS , A., AND L EUNG , T. 1999. Texture synthesis by non-parametric
sis. Our graph cut approach is ideal for computing seams of patch sampling. In International Conference on Computer Vision, 1033–1038.
regions and determining placement of patches to generate perceptu- F ORD , L., AND F ULKERSON , D. 1962. Flows in Networks. Princeton
ally smooth images and video. We have shown a variety of synthe- University Press.
sis examples that include structured and random image and video
G REIG , D., P ORTEOUS , B., AND S EHEULT, A. 1989. Exact maximum a
textures. We also show extensions that allow for transformations of posteriori estimation for binary images. Journal of the Royal Statistical
the input patches to permit variability in synthesis. We have also Society Series B, 51, 271–279.
demonstrated an application that allows for merging of two dif-
ferent source images interactively. In general, we believe that our G UO , B., S HUM , H., AND X U , Y.-Q. 2000. Chaos mosaic: Fast and mem-
technique significantly improves upon the state of the art in texture ory efficient texture synthesis. Tech. Rep. MSR-TR-2000-32, Microsoft
Research.
synthesis by providing the following benefits: (a) no restrictions on
shape of the region where seam will be created, (b) consideration of H EEGER , D. J., AND B ERGEN , J. R. 1995. Pyramid-based texture analy-
old seam costs, (c) easy generalization to creation of seam surfaces sis/synthesis. Proceedings of SIGGRAPH 95 (August), 229–238. ISBN
for video, and (d) a simple method for adding constraints. 0-201-84776-0. Held in Los Angeles, California.
K ILTHAU , S.L., D REW, M., AND M OLLER , T. 2002. Full search con-
tent independent block matching based on the fast fourier transform. In
Acknowledgements ICIP02, I: 669–672.
L I , S. Z. 1995. Markov Random Field Modeling in Computer Vision.
We would like to thank Professor Ramin Zabih and his students Springer-Verlag.
at Cornell University for sharing their code for computing graph
L IANG , L., L IU , C., X U , Y.-Q., G UO , B., AND S HUM , H.-Y. 2001. Real-
min-cut [Boykov et al. 1999]. Thanks also to the authors of [Efros
time texture synthesis by patch-based sampling. ACM Transactions on
and Freeman 2001], [Wei and Levoy 2000] and [Bar-Joseph et al.
Graphics Vol. 20, No. 3 (July), 127–150.
2001] for sharing their raw footage and results via their respective
web sites to facilitate direct comparisons. Thanks also to Rupert M ORTENSEN , E. N., AND BARRETT, W. A. 1995. Intelligent scissors for
Paget for maintaining an excellent web page on texture research image composition. Proceedings of SIGGRAPH 1995 (Aug.), 191–198.
and datasets, and to Martin Szummer for the MIT temporal texture P ORTILLA , J., AND S IMONCELLI , E. P. 2000. A parametric texture model
database. Thanks also to the following for contributing their images based on joint statistics of complex wavelet coefficients. International
or videos: East West Photo, Jens Grabenstein, Olga Zhaxybayeva, Journal of Computer Vision 40, 1 (October), 49–70.
Adam Brostow, Brad Powell, Erskine Wood, Tim Seaver and Art- S AISAN , P., D ORETTO , G., W U , Y., AND S OATTO , S. 2001. Dynamic
beats Inc.. Finally, we would like to thank our collaborators Gabriel texture recognition. In Proceeding of IEEE Conference on Computer
Brostow, James Hays, Richard Szeliski and Stephanie Wojtkowski Vision and Pattern Recognition (CVPR), II:58–63.
for their insightful suggestions and help with the production of this S CH ÖDL , A., S ZELISKI , R., S ALESIN , D. H., AND E SSA , I. 2000. Video
work. Arno Schödl is now at think-cell Software GmbH in Berlin, textures. Proceedings of SIGGRAPH 2000 (July), 489–498. ISBN 1-
Germany. 58113-208-5.
This work was funded in part by grants from NSF (IIS-
S EDGEWICK , R. 2001. Algorithms in C, Part 5: Graph Algorithms.
9984847, ANI-0113933 and CCR-0204355) and DARPA (F49620-
Addison-Wesley, Reading, Massachusetts.
001-0376), and funds from Microsoft Research.
S OATTO , S., D ORETTO , G., AND W U , Y. 2001. Dynamic textures. In
Proceeding of IEEE International Conference on Computer Vision 2001,
References II: 439–446.
S OLER , C., C ANI , M.-P., AND A NGELIDIS , A. 2002. Hierarchical pattern
A SHIKHMIN , M. 2001. Synthesizing natural textures. 2001 ACM Sym- mapping. ACM Transactions on Graphics 21, 3 (July), 673–680.
posium on Interactive 3D Graphics (March), 217–226. ISBN 1-58113-
292-1. S ZUMMER , M., AND P ICARD , R. 1996. Temporal texture modeling.
In Proceeding of IEEE International Conference on Image Processing
BAR -J OSEPH , Z., E L -YANIV, R., L ISCHINSKI , D., AND W ERMAN , M. 1996, vol. 3, 823–826.
2001. Texture mixing and texture movie synthesis using statistical learn-
ing. IEEE Transactions on Visualization and Computer Graphics 7, 2, WANG , Y., AND Z HU , S. 2002. A generative method for textured motion:
120–135. Analysis and synthesis. In European Conference on Computer Vision.
W EI , L.-Y., AND L EVOY, M. 2000. Fast texture synthesis using tree-
B OYKOV, Y., V EKSLER , O., AND Z ABIH , R. 1999. Fast approximate en-
structured vector quantization. Proceedings of SIGGRAPH 2000 (July),
ergy minimization via graph cuts. In International Conference on Com-
479–488. ISBN 1-58113-208-5.
puter Vision, 377–384.
Figure 8: 2D texture synthesis results. We show results for textured and natural images. The smaller images are the example images used for
synthesis. Shown are CHICK PEAS, TEXT, NUTS, ESCHER, M ACHU P ICCHU!Adam c Brostow, CROWDS and SHEEP from left to right and top
to bottom.
Figure 9: Comparison of our graph cut algorithm with Image Quilting [Efros and Freeman 2001]. Shown are KEYBOARD and OLIVES. For
OLIVES , an additional result is shown that uses rotation and mirroring of patches to increase variety. The quilting result for KEYBOARD was
generated using our implementation of Image Quilting; the result for OLIVES is courtesy of Efros and Freeman.
c
Figure 10: Images synthesized using multiple scales yield perspective effects. Shown are BOTTLES and LILIES!Brad Powell. We have, from
left to right: input image, image synthesized using one scale, and image synthesized using multiple scales.
Figure 11: Examples of interactive blending of two source images. Shown are HUT and MOUNTAIN!Erskine c Wood in the top row, and
c
RAFT and RIVER !Tim Seaver in the bottom row. The results are shown in the third column. In the last column, the computed seams are also
rendered on top of the results.