Está en la página 1de 18

Breaking a Visual CAPTCHA

Seminar Fall 2006/07

Dr. Igor Fischer


Telecommunications Lab
Saarland University
http://www.nt.uni-saarland.de/
Outline

• General and formal stuff


– Who are we and what we do
– How is the seminar organized
– What do we expect from you
• About the subject
– What is a CAPTCHA and how is it used
– Seminar topics

Breaking a Visual CAPTCHA


About the Telecommunications Lab

• Research projects:
– QoS in wireless home networks
– Binaural computational source separation
– MiniWatt: estimating EM radiation
– Binary coding of XML metadata (for DVB)
– Visual authentication of digital documents
• People
– Chair, technical and administrative assistants,
five researchers, ten students

Breaking a Visual CAPTCHA


Relevant Staff
• Prof. Dr. Thorsten Herfet
– Lab chair
• Zakaria Keshta
– System administrator
• Jochen Miroll
– Assistant system administrator
• Igor Fischer
– Seminar supervisor

• Other staff and details available under


http://www.nt.uni-saarland.de/people/
Breaking a Visual CAPTCHA
Formal Stuff About the Seminar

• 2 SWS, 8 credit points ⇒ 200–215 hours work


• Overall organization:
– Introductory Lecture (today)
• Demonstrate the „big“ image & show how the various components are linked
• Give a summary of potential topics to choose from
– Knowledge gathering and planning phase (the next three weeks)
• Reading and understanding the literature
• Defining the system blocks and their interfaces
• Writing a short summary of a topic
– Implementation phase (Matlab, until 23.01.2007)
• Writing the interfaces, ensuring the blocks properly communicate
• Writing the block’s implementation, documenting the code
• Testing and debugging
– Presentation phase (12.12.2006-07.02.2006)
• Write summary and prepare presentation (until 30.01.2007)
• Present the work (block seminar 06.–07.02.2007
• Language:
– English preferred (required for program documentation and slides)

Breaking a Visual CAPTCHA


The Purpose of the Seminar
• Knowledge Transfer
– Obtain understanding of the purpose and meaning of visual CAPTCHAs
– Get an extensive summary about attacks on visual CAPTCHAs
– Get in-depth knowledge of one pipeline step
– Prepare for further works on the subject
• Diploma / Master-Theses, Student Assistant for Research
• Experience the developer’s toolbox
– Project planning
• Read and understand literature on breaking visual CAPTCHAs
• Coordinate and cooperate with other students
– Implementation
• Develop a block in the CAPTCHA-breaking software pipeline
• Document the software
– Presentation
• Squeeze highly technical details into an easy-to-understand summary
• Demonstrate and explain the block

Breaking a Visual CAPTCHA


What Do We Expect From You
• Reading and understanding the literature:
– Get the basic understanding of the whole pipeline
– Fully and in-depth understand your block
• Summarizing the block:
– Purpose and properties
– Interfaces (in/output)
• Programming and testing the block
– Interfaces and the implementation
– Test for correctness and for robustness
– Document the code
– Regularly check it in into the repository (Subversion)
• Documenting the work
– Write the description of your block and describe use cases
– Write the presentation (e.g. PowerPoint)
– Present your block understandably to the audience

Breaking a Visual CAPTCHA


Organizational Stuff
• Consultations: by appointment only

• Programming in Matlab
– Matlab-conform commenting of all functions (!!!)

• Version control system: Subversion


– First: create user name and password:
https://svn.nt.uni-saarland.de/seminar-pw/
Login: tc-seminar Password: captcha

– Test your work before checking in


– Check in only stable code (not necessarily bug-free, but shouldn’t crash
with “Division by zero” or similar)

• Seminar paper: write using LaTeX


– Style to be announced

Breaking a Visual CAPTCHA


Introduction to the subject
What is a CAPTCHA?

• Acronym for “Completely


Automated Public Turing
test to tell Computers and
Humans Apart”
– Generally: a problem which
humans can solve, but
computers cannot (at least
so we hope)
– Visual CAPTCHA: a visual
problem for which humans
can easily give an answer,
but not computers

Breaking a Visual CAPTCHA


Usage for CAPTCHAs

• Spam prevention
– Screening for human users of web mail
accounts
– Blocking automated registration of web sites
with search engines
• Document authentication
– Preventing automated manipulations of
documents

Breaking a Visual CAPTCHA


Seminar Topics
• Correspond to the blocks in the CAPTCHA-breaking pipeline:
1. Generalized shape context
2. Fast candidate pruning
3. Detailed matching (bipartite graph matching)
4. Finding letter hypotheses
5. Extracting candidate words
6. Scoring the candidates and choosing the best one (simple)
7. Candidate pruning with bigrams
8. Layers of words
9. Scoring the candidates and choosing the three best
10. Recovering the background

• Literature:
– Mori and Malik, “Recognizing Objects in Adversarial Clutter: Breaking a
Visual CAPTCHA” (and references)
– Efros and Leung, “Texture Synthesis by Non-parametric Sampling”

Breaking a Visual CAPTCHA


Topics 1-2

• Generalized shape context:


– Based on simple shape context
– Simple and robust representation of objects
(letters in our case)

• Fast candidate pruning:


– Eliminating unlikely candidates based on a
random subset of their generalized shape
contexts

Breaking a Visual CAPTCHA


Topics 3-4

• Detailed matching
– Uses deformable template approach
– Fits observed shape to one in the database

• Finding letter hypotheses


– Guessing probable letters and their positions
in the image

Breaking a Visual CAPTCHA


Topics 5-6

• Extracting candidate words


– Constructing a directed acyclic graph over
letter hypotheses
– Using trigrams and dictionary to pruning

• Choosing the best candidate


– For each letter compute the deformable
matching cost
– Word score is the average of letter scores

Breaking a Visual CAPTCHA


Topics 7-8

• Pruning with bigrams


– Uses adjacent two letters at the word
beginning or end
• More robust against complex gimpy CAPTCHAs,
with several overlapping words

• Layers of words
– Given the guess of one word, try to recover
the one beneath it
Breaking a Visual CAPTCHA
Topics 9-10

• Choosing the best three candidates


– Produce synthetic image of two overlaying
words
– Compute the shape contexts and score
separately

• Removing the text and reconstructing the


background
– Use texture synthesis

Breaking a Visual CAPTCHA


Thank you for your attention

Please fill in the form to allow us


the best topic assignment

También podría gustarte