Está en la página 1de 4

Data Compression using Wireless Distributed

Computing
Ivan Pillay, Bhushan Mali, Siddhesh Pande, Shubham Soni, Prof. Shilpa Pimpalkar
ivan24400@gmail.com, bhushanmali96@gmail.com, siddheshpandes@gmail.com
Department of Computer Engineering
AISSMS IOIT, Pune

ABSTRACT to tackle this problem we can compress some large files, but
as compression is cpu intensive task we can distribute this
Since last decade we have seen an explosive growth in task to nearby idle devices.
smartphone technology, especially in computational power.
Also some users switch their smartphone every one or two
Most of smartphones are equipped with WiFi chipset which
year with a new smartphone, while they either do not use their
allow multiple devices to communicate wirelessly. We can
previous phone or use it rarely. We can use this idle
use wireless technology to distribute the required file to
computational resources to save energy on our current phone
nearby devices. Each of the nearby devices will compress a
by using wireless distributed computing method. In this
partition or the original file itself and store it locally. The
approach we distribute a task from the user’s current
compressed output is then sent back to master or distributer
smartphone to its nearby idle devices in order to save
device which is then combined in sequence to form a single
resources such as battery on the user’s device. One such task
compressed file.
that takes a lot of cpu and battery resources is compression. A
lot of data especially media is generated from smartphones 2. RELATED WORK
which fill up a user’s device quickly. This may hinder the This [5] paper discusses load balancing techniques for
performance of other applications on the devices. In this paper
wireless devices. Their proposed Adaptive Load Balancing
we discuss how we can offload the task of compression onto
nearby devices which will save cpu as well as battery (ALB) algorithm works by initializing all available nodes Ng.
resources and also it will reduce the size of file which A master node is selected i.e. N1 that has knowledge of all
consumes less storage space on device. other nodes including channel conditions and available energy
levels. The master node then decides how data should be
Keywords routed or processed by selecting nodes that have good
Data compression, Peer-to-peer computing, Distributed processing power as well as their corresponding channel
computing, Wireless networks. condition. ALB algorithm distributes tasks based on available
energy levels of the selected nodes i.e. nodes with good
channel conditions will get more share of the task than the
1. INTRODUCTION nodes with poor conditions.
During past few years we have seen a tremendous growth in
information technology and computer science fields. Devices
This [7] algorithm is used in spectrum sensing technology.
or phones are getting smaller, efficient and cheap. All the
This paper presents compression of textual data on mobile
tasks which were at one point done manually by a human are devices. It uses few algorithms and compare the results among
now executed by applications installed in our smartphone them to analyse suitable algorithm for different types of
such as calculator. Even advanced applications such as video textual data. It uses Burrows-Wheeler transform, Move-to-
decoding or network access can now also work on our pocket
front, Arithmetic coding techniques to compress data. It also
sized phone. An average smartphone as of 2018 comes with
demonstrates the usage of these algorithms on mobile phones.
1.7 Ghz cpu, 3GB ram, 32GB internal storge, 3200mAh of
battery life and optionally a GPU. Such specification does not This [10] paper discusses a scalable framework for wireless
constrain applications from using or generating large amount distributed computing. The algorithm divides a huge database
of data. small enough that can be stored in each of the nodes. This
reduces the total number of access-requests required by the
Huge amount of data quickly populates device’s storage and nodes to access the Database. All the node processes on the
memory is managed upto certain extent by the operating data that is locally available and share its results with other
nodes. The other nodes then use this intermediate values for
system. The operating itself occupies a fair amount of
their operation thus reducing the computational time required
memory and storage space and may allow fixed number of to process entire data-set.
applications to stay active in memory. This may result in a
necessary(for the user) application starve for resources. Thus

Volume 3 Issue 2 April - 2018 134


3. SYSTEM ARCHITECTURE appear as runs of same (usually 0) symbol. The output is
indices of character set. To decode we initialize the same
character set that we used for our input. Read one character
which is an index and output symbol stored at that index in
our character set. The symbol at that index is now removed
from its location and inserted at top of the list.

Eg: charset = { a, b, c, d, e, f } (1 based)


input = badc
Symbols Index CharSet after output
1. b = 2 = { b, a, c, d, e, f }
2. ba = 2, 2 = { a, b, c, d, e, f }
3. bad = 2, 2, 4 = {d, a, b, c, e, f }
4. badc = 2, 2, 4, 4 = { c, d, a, b, e, f }

Figure 1: System Architecture 4.2.3 Burrows Wheeler Transform


This algorithm inputs a block of symbols. Creates a square
As seen in Figure 1, the Master device M creates a network to matrix of size same as length of block. It inserts first block as
which all nearby Slave devices join using Wi-Fi. Slave first row and moves pointer to next row. It performs a single
Devices automatically detect M. Upon finalizing peer count, rotate operation on previous row and stores the output in
the distribution for each device is calculated based on their current row, this process is repeated until number of rows
free space and battery level. Parts of file are send to each of equals length of block. Finally it sorts all rows
the device in the peer group as per allocated size. Device Y lexicographically and outputs last column of the matrix and
will not be allocated any task due to insufficient resources index of original input in matrix. The output contains all the
hence compression job will be divided among J and S only. symbols which were in input, but most of the recurring
symbols are arranged next to each other. To decode we again
Peer devices i.e., J and S upon receiving data, compresses it create a square matrix of input’s length. We add output of
then sends the result back at source address i.e., to Master BWT to last column, sort the matrix. Repeat this procedure
device M. M after confirming data reception from each peer length times. The string at index is original string.
device concatenates all the parts to create a single compressed
file. All of the temporary files created at peer devices are Eg: input = ^banana$ ( where ^ and $ have least value)
deleted after operation. output = bnn^aa$a ; key = 6

4. ALGORITHMS 4.2.4 Huffman Coding


This algorithm constructs a binary tree in which its leaf nodes
4.1 Deflate are unique symbols present in input stream. Leaf nodes which
It is a lossless data compression algorithm. The input data is
have greatest frequency in input stream have shortest depth
passed through LZ77 algorithm which is a dictionary based
and vice-versa. It again scans the input from beginning one by
compressor. The output of LZ77 is then passed through
one, for each symbol it outputs a variable length code which is
Huffman encoder which is a variable length coder.
made based on location of that symbol in tree. It traverses the
tree and for each left branch it outputs 0 and for each right
4.2 dcrz branch it outputs 1. For each block of input it also stores tree
This is another algorithm that we have used in the application
or frequency codes in output. To decode, the algorithm
and is a combination of following algorithms:
reconstructs tree from metadata and scans bit-wise until
4.2.1 Run Length appropriate symbol is found. As huffman codes are prefix
This algorithm stores runs of a single character as codes there exist no symbol whose code is a prefix to another
length,character. It can store a maximum of four literals. To symbol’s code.
decode, just read first character to determine if it’s a run or a 5. DATA PROCESSING COST
literal and perform appropriate operation.
Eg: input: aaaaabbb 5.1 Data Transmission
Let's consider total n devices present in the system. The whole
output: 5a3bbb
file F is divided into n-1 chunks of data F1, F2, F3 … Fn-1.
4.2.2 Move to Front The offloader sends partitioned data to rest of the other
Whenever it sees a symbol (from a predefined list of devices available. The time required to transmit i th chunk to ith
symbols), it outputs symbol’s location/index in the list, deletes device is:
that symbol from the list and inserts the same symbol in front
of list. This make consecutive stream of symbols or patterns

Volume 3 Issue 2 April - 2018 135


Fi
td  --where B is bandwidth
Bi 5.3 Total Cost
T  td  tc
n n
Fi Fi
  i 0 Bi
 C
i 0 i
5.2 Data Compression
The time required to execute compression algorithm on i th n
 1 1 
device is given as:   Fi   
i 0  Bi Ci 
Fi
tc  --where C is time required to execute algorithm.
Ci

Table 1. Comparision of compression tools against algorithms used.

Compressed Size (bytes)


Input (bytes)
Gzip Bzip2 Deflate dcrz

18318 6248 5246 6229 8599

593683[11] 223643 162272 223745 392584

1044021[12] 406626 298655 406869 687024

76082066[13] 52314738 41484576 52314828 57026284

249998858[14] 41775833 34092849 41776024 83487252

420593261[15] 419505932 421664234 419560443 421937107

possible enabling the user to take advantage of multiple


networks available nearby.

6. CONCLUSION AND FUTURE WORK


Thus we have studied compression algorithms and how
computational power available from nearby devices can be 7. ACKNOWLEDGMENTS
exploited. This application will be useful for users who carry The authors would like to thank their project guide Prof.
most of their data on smartphone only. Wireless Distributed Shilpa Pimpalkar for her guidance in their project. They also
Computing can be used to conserve resources on master appreciate Dr K. S. Wagh and Mr. Suresh Limkar for their
device which then can be used by other applications. Since all support.
these operations are done locally there is no need for Internet
8. REFERENCES
[1] Hong Quy Le, Hussein Al-Shatri and Anja Klein,
“Optimal Joint Power Allocation and Task Splitting in
Wireless Distributed Computing", in SCC 2017; 11th
International ITG Conference on Systems,
Communications and Coding, 2017.
[2] G. Massari, M. Zanella and W. Fornaciari, “Towards
Distributed Mobile Computing”, in Mobile System
Technologies Workshop (MST), 2016.
or an access point. This application can be further extended
infuture to distribute other tasks (that can be divided into sub [3] J. Ziv and A. Lempel, “On the Complexity of Finite
Sequences", in IEEE Transactions on Information
tasks) as well without relying on external resources such as
Theory, 1976.
Internet. Allocating resources in multihop network is also

Volume 3 Issue 2 April - 2018 136


[4] Sergio De Agostino, “Lempel-Ziv Data Compression on
Parallel and Distributed Systems", in Data Compression,
Communications and Processing (CCP), 2011.
[5] Mohammed M. Alfaqawi et al, “Adaptive Load
Balancing Algorithm For Wireless Distributed
Computing Networks", in Intelligent Systems
Engineering (ICISE), 2016.
[6] Gerardo Calice, Abderrahmen Mtibaa, Roberto Beraldi et
al, “Mobile-to-mobile opportunistic task splitting and
offoading", in Wireless and Mobile Computing,
Networking and Communications (WiMob), 2015.
[7] Eko Darwiyanto, Heru Anugrah Pratama and Gia
Septiana, “Text data compression for mobile phone using
burrows-wheeler transform, move-to-front code and
arithmetic coding", in Information and Communication
Technology (ICoICT ), 2015.
[8] Jouni Siren, “Burrows-Wheeler Transform for
Terabases", in Data Compression Conference (DCC),
2016.
[9] J. Ziv and A. Lempel, “A Universal Algorithm for
Sequential Data Compression", in IEEE Transactions on
Information Theory, vol 23, 1977.
[10] Songze Li, Qian Yu, Mohammad Ali Maddah-Ali and A.
Salman Avestimehr, “A Scalable framework for Wireless
Distributed Computing”, in IEEE/ACM Transactions on
Networking, 2017.
[11] textfiles.com. [Online]. Available:
http://textfiles.com/etext/FICTION/2000010.txt
[12] textfiles.com. [Online]. Available:
http://textfiles.com/etext/FICTION/alcott-little-261.txt
[13] 4ksamples. [Online]. Available:
http://downloads.4ksamples.com/downloads/sample-
Elysium.2013.2160p.mkv
[14] National Center for Biotechnology Information.
[Online]. Available:
ftp://ftp.ncbi.nih.gov/genbank/gbbct1.seq.gz
[15] xiph-media . [Online]. Availble: https://xiph-
media.net/video/derf/y4m/bridge_close_qcif.y4m

Volume 3 Issue 2 April - 2018 137

También podría gustarte