Documentos de Académico
Documentos de Profesional
Documentos de Cultura
SIKULI SCRIPT
AT, MAHATMA GANDHI INSTITUTE OF TECHNICAL EDUCATION & RESEARCH CENTRE-NAVSRI Year-2011
A Seminar Report on
SIKULI SCRIPT
Submitted to,
Department of Computer Science Engineering
Submitted by,
SNEHAL M PATEL B.E 3 COMPUTER (5th SEM) Enrolment No. 090330131025
rd
CERTIFICATE
This is to certify that the seminar entitled Sikuli Script is submitted by, Snehal M Patel bearing Enrolment No. 090330131025 of Computer Science Engineering Department (B.E 3rd, Sem V) in fulfillment of the requirement, has satisfactorily completed his work for the academic year JUNE-2011 to OCT-2011.
Diya Vadhwani
Internal Guide,
Sikuli Script
ACKNOWLEDGEMENT
We are extremely grateful to Prof. Mukesh Patel, Head of Department of Computer Science Department-MGITER, Navsari for providing all the required resources for the successful completion of our seminar. My heartfelt gratitude to my internal guide Diya Vadhwani, Associate Professor, for her valuable suggestions and guidance in the preparation of the seminar report. We will be failing in duty if we do not acknowledge with grateful thanks to the authors of the references and other literatures referred to in this seminar. Last but not the least; we are very much thankful to our parents who guided us in every step which we took.
- SNEHAL M PATEL
MGITER/ CO/2011
P a g e | ii
Sikuli Script
ABSTRACT
Sikuli Script designed by associate professor Rob Miller, grad student Tsung-Hsiang Chang, and the University of Marylands Tom Yeh, is called Sikuli, which means Gods eye in the language of Mexicos Huichol Indians. In a paper that won the best-student-paper award at the Association for Computing Machinerys User Interface Software and Technology conference in 2010, the researchers showed how Sikuli could aid in the construction of scripts, short programs that combine or extend the functionality of other programs.Using the system requires some familiarity with the common scripting language Python. But it requires no knowledge of the code underlying the programs whose functionality is being combined or extended. When the programmer wants to invoke the functionality of one of those programs, she simply draws a box around the associated GUI, clicks the mouse to capture a screen shot, and inserts the screen shot directly into a line of Python code. Sikuli Script is a visual technology to search and automate graphical user interfaces (GUI) using images (screenshots) which is under development, which automates anything we see on the screen without internal API's support. The first release of Sikuli contains Sikuli Script, a visual scripting API for Jython, and Sikuli IDE, an integrated development environment for writing visual scripts with screenshots easily. Sikuli Script enables the Programmer to writes program against the user interface instead of an API. Sikuli Script automates anything we see on the screen without internal API's support. We can programmatically control a web page, a desktop application running on Windows/Linux/Mac OS X, or even an iPhone / Android / Symbian application running in an emulator. The developers behind this project are,
MGITER/ CO/2011
P a g e | iii
Sikuli Script
Page No.
ii iii 01 02 02 02 03 04 04 04 07 08 10 10 11 12 13 15 15 15 20 20 21 21 22 23 24 25 26
MGITER/ CO/2011
P a g e | iv
Sikuli Script
INTRODUCTION
Until the 1980s, using a computer program means memorizing a lot of commands and typing them in a line at a time, only to get lines of text back. The graphical user interface, or GUI, changed that. By representing programs, program functions, and data as two-dimensional images like icons, buttons and windows the GUI made intuitive and spatial what had been memory intensive and laborious. But while the GUI made things easier for computer users, it didnt make them any easier for computer programmers. Underlying GUI components is a lot of computer code, and usually, building or customizing a program, or getting different programs to work together, still means manipulating that code. Researchers in MITs Computer Science and Artificial Intelligence Lab hope to change that, with a system that allows people to write programs using screen shots of GUIs. Ultimately, the system could allow casual computer users to create their own programs without having to master a programming language. Researchers at the University of Maryland and Massachusetts Institute of Technology have developed a screen-capturebased scripting environment that could signal a new programming paradigm that leverages the graphical interface as a sort of API. The Sikuli system lets users with minimal programming experience use GUI screen shots to create scripts that interact with applications. Ultimately, it will open opportunities to develop scripts that touch multiple applications without requiring any understanding of the underlying programs APIs In human-to-human communication, asking for information about tangible objects can be naturally accomplished by making direct visual references to them..... For example, to instruct a mover to put a lamp on top of a nightstand, we would say, put this over there while pointing to and respectively.
Likewise, in human-to-computer communication, finding information or issuing commands involving GUI elements can be accomplished naturally by making direct visual reference to them. Sikuli allows user or programmer to make direct Visual reference to GUI elements. To search a documentation database about a GUI element, a user can draw a Rectangle around it and take a Screenshot as a query. Similarly, to automate interactions with a GUI element, a programmer can insert screenshots directly into a script statement and specify what keyboard or mouse action to invoke when this element is seen on screen.
MGITER/ CO/2011
Page | 1
Sikuli Script
HOW IT WORKS
2.1 Sikuli: Seeing Pixels
Sikuli's greatest value is its generality, "If it has pixels that Sikuli can see, and then it's open to automation". The technique is open to any application with a GUI that can display on a Windows, Mac, or Linux desktop. Users have already been apply it to not just desktop applications, but also Web pages, video games, mobile phone apps (running in a simulator or using a remote connection between the desktop and the phone), and applications from other platforms running in a virtual machine.
MGITER/ CO/2011
Page | 2
Sikuli Script
(Fig: Sikuli Editor) They developed an editor to help users write visual scripts (Above Fig). To take a screenshot of a GUI element to add to a script, a user can click on the camera button (a) in the toolbar to enter the screen capture mode. The editor hides itself automatically to reveal the desktop underneath and the user can draw a rectangle around an element to capture its screenshot. The captured image can be embedded in any statement and displayed as an inline image. The editor also provides code completion. When the user types a command, the editor automatically displays the corresponding command template to remind the user what arguments to supply. For example, when the user types find, the editor will expand the command. The user can click on the camera button to capture a screenshot to be the argument for this find() statement. Alternatively, the user can load an existing image file from disk (b), or type the filename or URL of an image, and the editor automatically loads it and displays it as a thumbnail. The editor also allows the user to specify an arbitrary region of screen to confine the search to that region (c). Finally, the user can press the execute button (d) and the editor will be hidden and the script will be executed.
MGITER/ CO/2011
Page | 3
Sikuli Script
FUNCTIONS
3.1 Handling Applications
closeApp - exit - openApp - run switchApp 3.1.1 openApp( application ) application: The name of an application (case-insensitive) that can be found in the environment variable PATH, or be the full path to an application (Windows: use double backslash \\ for the path separator.) Opens the application application and brings it to the front most. openApp("cmd.exe") # Windows: found through PATH openApp("c:\\Program Files\\Mozilla Firefox\\firefox.exe") # windows: full path specified openApp("Safari") # Mac: opens Safari
3.1.2 switchApp( application ) application: The name of an application (case-insensitive). Switches to application application and brings it to the front most. If the application is not running, it will be launched by openApp().
switchApp("cmd.exe") # Windows: switches to open command prompt or starts one switchApp("c:\\Program Files\\Mozilla Firefox\\firefox.exe") # windows: opens a new browser window !! (since text cannot be found in the window title) switchApp("mozilla firefox") # windows: switches to the frontmost open browser window (no window open: does nothing !!) switchApp("Safari") # Mac: switches to Safari or starts it
MGITER/ CO/2011
Page | 4
Sikuli Script
3.1.3 closeApp( application ) application: The name of an application (case-insensitive). Closes the given application application. It does nothing if no opened window (Windows) or running app (Mac) can be found. Note: On Windows: see note with switchApp(). The whole application owning the matching window will be closed. closeApp("cmd.exe") # Windows: closes an open command prompt closeApp("c:\\Program Files\\Mozilla Firefox\\firefox.exe") # windows: does nothing, since text cannot be found in the window title closeApp("mozilla firefox") # windows: stops firefox including all its windows closeApp("Safari") # Mac: closes Safari including all its windows
3.1.4 run( command ) command: a command, that can be run from the command line. Executes the command command. The script waits for completion. 3.1.5 exit () Stops the script gracefully at this point.
MGITER/ CO/2011
Page | 5
Sikuli Script
3.2.2 input( [text] ) text: a string that is used as a message. If omitted, it is left blank. Displays a dialog box with an input field, a Cancel button, an OK button and text as message. The script waits for the user to click either Cancel or Ok.
MGITER/ CO/2011
Page | 6
Sikuli Script
3.3.1 setBundlePath( path-to-a-folder ) path-to-a-folder a fully qualified path to a folder containing your images used for finding patterns. Windows: use double backslashes. Sets the path for searching images in all Sikuli Script methods. Sikuli IDE sets this automatically to the path of the folder where it saves the script (.sikuli). Therefore, you should use this function only if you really know what you are doing. Using it generally means that you would like to take care of your captured images by yourself. 3.3.2 getBundlePath() returns: a string containing a fully qualified path to a folder containing your images used for finding patterns. Note: Sikuli IDE sets this automatically to the path of the folder where it saves the script (.sikuli). You may use this function if, for example, to package your private files together with the script or to access the picture files in the .sikuli bundles for other purposes. Sikuli only gives you to access to the path name, so you may need other python modules for I/O or other purposes. 3.3.3 setShowActions( False | True ) If set to True, when a script is run, Sikuli shows a visual effect on the spot where the action will take place before executing actions (e.g. click, dragDrop, type, etc) for about 2 seconds . The default setting is False.
MGITER/ CO/2011
Page | 7
Sikuli Script
Each method produces a new pattern, so they can be chained together. For example,
Pattern(
).similar(0.8).anyColor().anySize()
Matches screen regions that are 80% similar to of any size and of any color composition. Note that these pattern methods can impact the computational cost of the search; The more general the pattern, the longer it takes to find it.
MGITER/ CO/2011
Page | 8
Sikuli Script
3.4.3 Action The action commands specify what keyword and/or mouse events to be issued to the center of a region found by find(). The set of commands currently supported in our API are: click(Region), doubleClick(Region): These two commands issue mouse-click events to the center of a target region. For example, click( ) performs a single click on the first close button found on the screen. Modifier keys such as Ctrl and Command can be passed as a second argument. dragDrop(Region target, Region destination): This command drags the element in the center of a target region and drops it in the center of a destination region. For example, dragDrop( , ) drags a word icon and drops it in the recycle bin.
type(Region target, String text): This command enters a given text in a target region by sending keystrokes to its center. For example, type( types the UIST in the Google search box. ,UIST)
3.4.4 Region The Region class provides an abstraction for the screen region(s) returned by the find() function matching a given visual pattern. Its attributes are x and y coordinates, height, width, and similarity score. Typically, a Region object represents the top match, for example, r = find( ) finds the region most similar to and assigns it to the variable r. When used in conjunction with an iterative statement, a Region object represents an array of matches. For example, for r in find( ) iterates through an array of matching regions and the programmer can specify what operations to perform on each region represented by r. Another use of a Region object is to constrain the search to a particular region instead of the entire screen. For example,
find( ).find( ) constrains the search space of the second find() for the ok button to only the region occupied by the dialog box returned by the first find().
MGITER/ CO/2011
Page | 9
Sikuli Script
EXTENSIONS
4.1 How to Download and use:
The download of an extension is supported by the IDE through the menu Tools > Extensions. You get a popup, that lists the available and already installed extensions and allows to download new packages or updates for installed ones. This popup shows a new package not yet installed:
If you need more information about the features of the extension, just click More Info - this will open the related documentation from the web in a browser window. If you want to install the extension, just click the Install... button. The package will be downloaded and added to your extensions repository. This popup shows an installed package:
If a new version would be available at that time, the Install... button would be active again, showing the new version number. Now you could click and download the new version. How to Use an Extension To use the features of an installed extension in one of your scripts, just say from extensionname import *. For an usage example read Sikuli Guide. For information about features, usage and API use menu Tools -> Extensions -> More Info in the IDE.
MGITER/ CO/2011
P a g e | 10
Sikuli Script
The final structure of a JAR (filename extension-name-X.Y where X.Y is the version string) looks like this:
org/com - your-organization-or-company -- extension-name --- yourClass1.class --- yourClass2.class --- .... more classes extension-name - __init__.py - extension-name.py META-INF - MANIFEST.MF
The file __init__.py contains at least from extension-name import * to avoid one qualification level. So in a script you might either use:
import extension-name extension-name.functionXYZ()
or:
from extension-name import * functionXYZ()
The second case requires more investment in a naming convention, that avoids naming conflicts. The file extension-name.py contains the classes and methods, that represent the API, that one might use in a Sikuli script. As an example you may take the source of the extension Sikuli Guide
MGITER/ CO/2011
Page | 11
Sikuli Script
MGITER/ CO/2011
P a g e | 12
Sikuli Script
WORKING PROCEDURE
5.1 How it Work ?
Saving
.sikuli (Recognized as source code and opened in editor) Consists of python file (.py) and all (.png) images used. Also creates (.html) file for easy web sharing.
Executing
.skl (Executable script, zipped .sikuli directory) Recognized and run without opening IDE
MGITER/ CO/2011
P a g e | 13
Sikuli Script
Jython Encapsulation
End-User commands
Scope: Static Memory Management / Variables & Bindings: Heap Dynamic All objects and data structures Handled by Interpreter, no user control Malloc(), realloc(), free() etc. can be called by importing C library but results in mixed calls between C allocator & Python memory manager
Garbage Collection: Reference Count Data Types & Type Checking No Type checking, data types exist but pointers are changed freely Methods can require a specific type and are checked then # This is a comment in Python
Comments
MGITER/ CO/2011
Page | 14
Sikuli Script
APPLICTION
Minimizing All Active Windows Deleting Documents of Multiple Types Tracking Bus Movement Navigating a Map Responding to Message Boxes Automatically Monitoring a Baby
MGITER/ CO/2011
Page | 15
Sikuli Script
EXAMPLES
This script minimizes all active windows by calling find repeatedly in a while loop (1) and calling click on each minimize button found (2), until no more can be found.
This script deletes all visible Office files (Words, Excel, PowerPoint) by moving them to the recycle bin. First, it defines a function recycleAll() to find all icons matching the pattern of a given file type and move them to the recycle bin (1-3). Since icons may appear in various sizes depending on the view setting, anySize is used to find icons of other sizes (2). A for loop iterates through all matching regions and calls dragDrop to move each match to the recycle bin (3). Next, an array is created to hold the patterns of the three Office file types (4) and recycleAll() is called on each pattern (5-6) to delete the files. This example demonstrates Sikuli Scripts ability to define reusable functions, treat visual patterns as variables, perform fuzzy matching (anySize), and interact with built-in types (array).
MGITER/ CO/2011
Page | 16
Sikuli Script
This script tracks bus movement in the context of a GPSbased bus tracking application. Suppose a user wishes to be notified when a bus is just around the corner so that the user can head out and catch the bus. First, the script identifies the region corresponding to the street corner (1). Then, it enters a while loop and tries to find the bus marker inside the region every 60 seconds (2-3). Notice that about 30% of the marker is occupied by the background that may change as the maker moves. Thus, the similar pattern modifier is used to look for a target 70% similar to the given pattern. Once such target is found, a popup will be shown to notify the user the bus is arriving (4). This example demonstrates Sikuli Scripts with everyday tasks. 6.2.4 Navigating a Map:
This script automatically navigates east to Houston following Interstate 10 on the map (by dragging the map to the left). A while loop repeatedly looks for the Interstate 10 symbol and checks if a string Houston appears nearby (1). Each time the string is not found, the position 100 pixels to the left of the Interstate 10 symbol is calculated and the map is dragged to that position (3), which in effect moves the map to the east. This movement continues until the Interstate 10 can no longer be found or Houston is reached.
MGITER/ CO/2011
Page | 17
Sikuli Script
This script generates automatic responses to a predefined set of message boxes. A screenshot of each message box is stored in a visual dictionary d as a key and the image of the button to automatically press is stored as a value. A large number of message boxes and desired responses are defined in this way (1-100). Suppose the win32gui library is imported (101) to provide the function getActiveWindow(), which is called periodically (102) to obtain the handle to the active window (103). Then, we take a screenshot by calling getScreenshot() (104) and check if it is a key of d (105). If so, this window must be one of the message boxes specified earlier. To generate an automatic response, the relevant button image is extracted from d (106) and the region inside the active window matching the button image is found and clicked (107). This example shows Sikuli Script can interact with any Python library to accomplish tasks neither can do it alone.
MGITER/ CO/2011
Page | 18
Sikuli Script
This script demonstrates how visual scripting can go beyond the realm of desktop to interact with the physical world. The purpose of this script is to monitor for baby rollover through a webcam that streams video to the screen. A Special Green Mark is Posted on Babys Forehead .By periodically checking if the marker is present (1- 2), the script can detect baby rollover when the marker is absent and issue notification (3).
MGITER/ CO/2011
Page | 19
Sikuli Script
7.1 Evaluation:
To evaluate Sikuli Test, we performed testability analysis how diverse the visual behavior GUI testers can test automatically, and reusability analysishow likely testers can reuse a test script as a GUI evolves.
MGITER/ CO/2011
Page | 20
Sikuli Script
In addition to these valid visual behaviors, there are 307 rarely paired improbable visual behaviors indicated by an X. As can be seen, the majority of the valid visual behavior considered in this analysis can be tested by Sikuli Test. However, complex visual behavior such as those involving animations (i.e., fading, animation) are currently not testable, which topic for future work.
MGITER/ CO/2011
Page | 21
Sikuli Script
LIMITATIONS
MGITER/ CO/2011
P a g e | 22
Sikuli Script
FUTURE DEVELOPMENT
To automate scrolling or tab switching actions to bring the GUI elements into view to interact with it visually fast and accurate OCR on screen Accessibility API integration
MGITER/ CO/2011
P a g e | 23
Sikuli Script
Platform Independence:
Works on ANY GUI that can be displayed on Windows/Linux/Mac Virtual machines Remote desktop Mobile simulators: Android, iPhone Web: Flash, HTML+Javascript
UI: visible, familiar, always exists API: faster, probably more stable
The semantic gap between the test scripts and the test tasks automated by the scripts is small. It is easy to read a test script and understand what GUI feature the script is designed to test.
MGITER/ CO/2011
Page | 24
Sikuli Script
WHO IS USING ?
MGITER/ CO/2011
P a g e | 25
Sikuli Script
REFERENCES
http://www.makeuseof.com/tag/create-automation-scripts-easilyscreenshots/ T. Yeh, T.-H. Chang, and R. C. Miller. Sikuli: Using GUI screenshots for search and automation. In UIST 09, pages 183192. ACM, 2009 http://hcc.cc.gatech.edu/documents/104_Edwards_week2.pdf
MGITER/ CO/2011
Page | 26