A Novel Approach for Workload Optimization and Improving Security in Cloud Computing Environments

Copy­right Notice & Dis­claimer

© Atul Patil, 2015. All rights reserved. This arti­cle, titled “A Nov­el Approach for Work­load Opti­miza­tion and Improv­ing Secu­ri­ty in Cloud Com­put­ing Envi­ron­ments”, was orig­i­nal­ly pub­lished in the IOSR Jour­nal of Com­put­er Engi­neer­ing (IOSR-JCE), e‑ISSN: 2278–0661, p‑ISSN: 2278–8727, Vol­ume 17, Issue 2, Ver. 1 (Mar – Apr. 2015), PP 20–27.

Dis­claimer: This arti­cle has been repub­lished here by the orig­i­nal author, Atul Patil, in accor­dance with the copy­right poli­cies of the IOSR Jour­nal of Com­put­er Engi­neer­ing (IOSR-JCE). The con­tent remains unchanged to main­tain its orig­i­nal­i­ty and authen­tic­i­ty. For fur­ther inquiries or copy­right clar­i­fi­ca­tions, please con­tact the author direct­ly.

Abstract: This paper dis­cuss­es a pro­pose cloud infra­struc­ture that com­bines On-Demand allo­ca­tion of resources with improved uti­liza­tion, oppor­tunis­tic pro­vi­sion­ing of cycles from idle cloud nodes to oth­er processes.Because for cloud com­put­ing to avail all the demand­ed ser­vices to the cloud con­sumers is very dif­fi­cult. It is a major issue to meet cloud consumer’s require­ments. Hence On-Demand cloud infra­struc­ture using Hadoop con­fig­u­ra­tion with improved CPU uti­liza­tion and stor­age uti­liza­tion is pro­posed using split­ting algo­rithm by using Map-Reduce. Henceall cloud nodes which remains idle are all in use and also improve­ment in secu­ri­ty chal­lenges and achieves load bal­anc­ing and fast pro­cess­ing of large data in less amount of time. Here we com­pare the FTP and HDFS for file upload­ing and file down­load­ing; and enhance the CPU uti­liza­tion and stor­age uti­liza­tion. Cloud com­put­ing moves the appli­ca­tion soft­ware and data­bas­es to the large data cen­tres, where the man­age­ment of the data and ser­vices may not be ful­ly trust­wor­thy. There­fore this secu­ri­ty prob­lem is solve by encrypt­ing the data using encryption/decryption algo­rithm and Map-Reduc­ing algo­rithm which solve the prob­lem of uti­liza­tion of all idle cloud nodes for larg­er data.

Key­words: CPU uti­liza­tion, Stor­age uti­liza­tion, Map-Reduce,Splitting algo­rithm, Encryption/decryption algo­rithm.

I. Intro­duc­tion

Cloud com­put­ing con­sid­ered as a rapid­ly emerg­ing new par­a­digm for deliv­er­ing com­put­ing as a util­i­ty. In cloud com­put­ing var­i­ous cloud con­sumers demand vari­ety of ser­vices as per their dynam­i­cal­ly chang­ing needs. So it is the job of cloud com­put­ing to avail all the demand­ed ser­vices to the cloud con­sumers. But due to the avail­abil­i­ty of finite resources it is very dif­fi­cult for cloud providers to pro­vide all the demand­ed ser­vices. From the cloud providers’ per­spec­tive cloud resources must be allo­cat­ed in a fair man­ner. So, it’s a vital issue to meet cloud con­sumers’ QoS require­ments and sat­is­fac­tion. In order to ensure on-demand avail­abil­i­ty a provider needs to over­pro­vi­sion: keep a large pro­por­tion of nodes idle so that they can be used to sat­is­fy an on-demand request, which could come at any time. The need to keep all these nodes idle leads to low uti­liza­tion. The only way to improve it is to keep few­er nodes idle. But this means poten­tial­ly reject­ing a high­er pro­por­tion of requests to a point at which a provider no longer pro­vides on-demand com­put­ing [2].

Sev­er­al trends are open­ing up the era of Cloud Com­put­ing, which is an Inter­net based devel­op­ment and use of com­put­er tech­nol­o­gy. The ever cheap­er and more pow­er­ful proces­sors, togeth­er with the “soft­ware as a ser­vice” (SaaS) com­put­ing archi­tec­ture, are trans­form­ing data can­ters into pools of com­put­ing ser­vice on a huge scale. Mean­while, the increas­ing net­work band­width and reli­able yet flex­i­ble net­work con­nec­tions make it even pos­si­ble that clients can now sub­scribe high qual­i­ty ser­vices from data and soft­ware that reside sole­ly on remote data cen­ters.

In the recent years, Infra­struc­ture-as-a-Ser­vice (IaaS) cloud com­put­ing has emerged as an attrac­tive alter­na­tive to the acqui­si­tion and man­age­ment of phys­i­cal resources. A key advan­tage of Infra­struc­ture-as-a- Ser­vice (IaaS) clouds is pro­vid­ing users on-demand access to resources. How­ev­er, to pro­vide on-demand access, cloud providers must either sig­nif­i­cant­ly over­pro­vi­sion their infra­struc­ture (and pay a high price for oper­at­ing resources with low uti­liza­tion) or reject a large pro­por­tion of user requests (in which case the access is no longer on-demand). At the same time, not all users require tru­ly on-demand access to resources [3]. Many appli­ca­tions and work­flows are designed for recov­er­able sys­tems where inter­rup­tions in ser­vice are expect­ed.

Here a method is pro­pose, a cloud infra­struc­ture with Hadoop con­fig­u­ra­tion that com­bines on-demand allo­ca­tion of resources with oppor­tunis­tic pro­vi­sion­ing of cycles from idle cloud nodes to oth­er process­es. The objec­tive is to han­dles larg­er data in less amount of time and keep uti­liza­tion of all idle cloud nodes through split­ting of larg­er files into small­er one using Map-Reduc­ing algo­rithm, also increase the CPU uti­liza­tion and stor­age uti­liza­tion for upload­ing files and down­load­ing files. To keep data and ser­vices trust­wor­thy, secu­ri­ty is also main­tain using RSA algo­rithmwhich is wide­ly used for secure data trans­mis­sion.

II. Relat­ed Work

There is much research work in the field of cloud com­put­ing over the past decades. Some of the work done has been dis­cussed, this paper researched cloud com­put­ing archi­tec­ture and its safe­ty, pro­posed a new cloud com­put­ing archi­tec­ture, SaaS mod­el was used to deploy the relat­ed soft­ware on the cloud plat­form, so that the resource uti­liza­tion and com­put­ing of sci­en­tif­ic tasks qual­i­ty will beim­proved [20].A cloud infra­struc­ture that com­bines on-demand allo­ca­tion of resources with oppor­tunis­tic pro­vi­sion­ing of cycles from idle cloud nodes to oth­er process­es by deploy­ing back­fill vir­tu­al machines (VMs)[21].

III The Pro­posed Sys­tem

Cloud com­put­ing has become a viable, main­stream solu­tion for data pro­cess­ing, stor­age and dis­tri­b­u­tion, but mov­ing large amounts of data in and out of the cloud pre­sent­ed an insur­mount­able challenge[4].Cloud com­put­ing is an extreme­ly suc­cess­ful par­a­digm of ser­vice ori­ent­ed com­put­ing and has rev­o­lu­tion­ized the way com­put­ing infra­struc­ture is abstract­ed and used. Three most pop­u­lar cloud par­a­digms include:

  1. Infra­struc­ture as a Ser­vice (IaaS)
  2. Plat­form as a Ser­vice (PaaS)
  3. Soft­ware as a Ser­vice (SaaS)

The con­cept can also be extend­ed to data­base as a Ser­vice or Stor­age as a Ser­vice. Scal­able data­base man­age­ment system(DBMS) both for update inten­sive appli­ca­tion work­loads, as well as deci­sion sup­port sys­tems are crit­i­cal part of the cloud infra­struc­ture. Ini­tial designs include dis­trib­uted data­bas­es for update inten­sive work­loads and par­al­lel data­base sys­tems for ana­lyt­i­cal work­loads. Changes in data access pat­terns of appli­ca­tion and the need to scale out to thou­sands of com­mod­i­ty machines led to birth of a new class of sys­tems referred to as Key-Val­ue stores[5].

In the domain of data analy­sis, we pro­pose the Map Reduce par­a­digm and its open-source imple­men­ta­tion Hadoop, in terms of usabil­i­ty and per­for­mance. The algo­rithm has six mod­ules:

  1. Hadoop Mul­ti-node Con­fig­u­ra­tion( Cloud Serv­er Set­up)
  2. Client reg­is­tra­tion and Login facil­i­ty
  3. Cloud Ser­vice Provider(Administrator )
  4. File Split­ting Map-Reduce Algo­rithm
  5. Encryption/Decryption of Data for secu­ri­ty
  6. Admin­is­tra­tion of client files(Third Par­ty Audi­tor)

Hadoop Con­fig­u­ra­tion (Cloud Serv­er Set­up)

The Apache Hadoop soft­ware library is a frame­work that allows for the dis­trib­uted pro­cess­ing of large data sets across clus­ters of com­put­ers using sim­ple pro­gram­ming mod­els. It is designed to scale up from sin­gle servers to thou­sands of machines, each offer­ing local com­pu­ta­tion and stor­age. Rather than rely on hard­ware to deliv­er high-avail­abil­i­ty, the library itself is designed to detect and han­dle fail­ures at the appli­ca­tion lay­er, so deliv­er­ing a high­ly-avail­able ser­vice on top of a clus­ter of com­put­ers, each of which may be prone to fail­ures

[6]. Hadoop imple­ments Map Reduce, using the Hadoop Dis­trib­uted File Sys­tem (HDFS).The HDFS allows users to have a sin­gle address­able name­space, spread across many hun­dreds or thou­sands of servers, cre­at­ing a sin­gle large file sys­tem. Hadoop has been demon­strat­ed on clus­ters with 2000 nodes. The cur­rent design tar­get is 10,000 node clus­ters.

Hadoop was inspired by Map-Reduce, frame­work in which an appli­ca­tion is bro­ken down into numer­ous small parts. Any of these parts (also called frag­ments or blocks) can be run on any node in the clus­ter. The cur­rent Apache Hadoop ecosys­tem con­sists of the Hadoop ker­nel, Map-Reduce, the Hadoop dis­trib­uted file sys­tem (HDFS).

Job­Track­er is the dae­mon ser­vice for sub­mit­ting and track­ing Map Reduce jobs in Hadoop. There is only One Job Track­er process run on any hadoop clus­ter. Job Track­er runs on its own JVM process. In a typ­i­cal pro­duc­tion clus­ter its run on a sep­a­rate machine. Each slave node is con­fig­ured with job track­er node loca­tion. The Job­Track­er is sin­gle point of fail­ure for the Hadoop Map Reduce ser­vice. If it goes down, all run­ning jobs are halt­ed. Job­Track­er in Hadoop per­forms, Client appli­ca­tions sub­mit jobs to the Job track­er. The Job­Track­er talks to the NameN­ode to deter­mine the loca­tion of the data The Job­Track­er locates Task­Track­er nodes with avail­able slots at or near the data The Job­Track­er sub­mits the work to the cho­sen Task­Track­er nodes. The Task­Track­er nodes are mon­i­tored. If they do not sub­mit heart­beat sig­nals often enough, they are deemed to have failed and the work is sched­uled on a dif­fer­ent Task­Track­er. A Task­Track­er will noti­fy the Job­Track­er when a task fails. The Job­Track­er decides what to do then: it may resub­mit the job else­where, it may mark that spe­cif­ic record as some­thing to avoid, and it may may even black­list the Task­Track­er as unre­li­able. When the work is com­plet­ed, the Job­Track­er updates its status[9].

A Task­Track­er is a slave node dae­mon in the clus­ter that accepts tasks (Map, Reduce and Shuf­fle oper­a­tions) from a Job­Track­er. There is only One Task Track­er process run on any hadoop slave node. Task Track­er runs on its own JVM process. Every Task­Track­er is con­fig­ured with a set of slots, these indi­cate the num­ber of tasks that it can accept. The Task­Track­er starts a sep­a­rate JVM process­es to do the actu­al work (called as Task Instance) this is to ensure that process fail­ure does not take down the task track­er. The Task­Track­er mon­i­tors these task instances, cap­tur­ing the out­put and exit codes. When the Task instances fin­ish, suc­cess­ful­ly or not, the task track­er noti­fies the Job­Track­er. The Task Track­ers also send out heart­beat mes­sages to the Job­Track­er, usu­al­ly every few min­utes, to reas­sure the Job­Track­er that it is still alive. These mes­sage also inform the Job­Track­er of the num­ber of avail­able slots, so the Job­Track­er can stay up to date with where in the clus­ter work can be delegated[9].

Namen­ode stores the entire sys­tem name­space. Infor­ma­tion like last mod­i­fied time, cre­at­ed time, file size, own­er, per­mis­sions etc. are stored in Namen­ode. The fsim­age on the name node is in a bina­ry for­mat. Use the “Offline Image View­er” to dump the fsim­age in a human-read­able for­mat. When the num­ber of files are

huge, a sin­gle Namen­ode will not be able to keep all the meta­da­ta . In fact that is one of the lim­i­ta­tions of HDFS [9].

The cur­rent Apache Hadoop ecosys­tem con­sists of the Hadoop ker­nel, Map Reduce, the Hadoop dis­trib­uted file sys­tem (HDFS).

The Hadoop Dis­trib­uted File Sys­tem (HDFS)

HDFS is a fault tol­er­ant and self-heal­ing dis­trib­uted file sys­tem designed to turn a clus­ter of indus­try stan­dard servers into a mas­sive­ly scal­able pool of stor­age. Devel­oped specif­i­cal­ly for large-scale data pro­cess­ing work­loads where scal­a­bil­i­ty, flex­i­bil­i­ty and through­put are crit­i­cal, HDFS accepts data in any for­mat regard­less of schema, opti­mizes for high band­width stream­ing, and scales to proven deploy­ments of 100PB and beyond[8].

Key HDFS Features:

  • Scale-Out Archi­tec­ture — Add servers to increase capac­i­ty
  • High Avail­abil­i­ty — Serve mis­sion-crit­i­cal work­flows and appli­ca­tions
  • Fault Tol­er­ance — Auto­mat­i­cal­ly and seam­less­ly recov­er from fail­ures
  • Flex­i­ble Access – Mul­ti­ple and open frame­works for seri­al­iza­tion and file sys­tem mounts
  • Load Bal­anc­ing — Place data intel­li­gent­ly for max­i­mum effi­cien­cy and uti­liza­tion
  • Tun­able Repli­ca­tion — Mul­ti­ple copies of each file pro­vide data pro­tec­tion and com­pu­ta­tion­al per­for­mance
  • Secu­ri­ty — POSIX-based file per­mis­sions for users and groups with option­al LDAP inte­gra­tion [8].

Client reg­is­tra­tion and Login facil­i­ty

It pro­vide Inter­face to Login. Client can upload the file and down­load file from cloud and get the detailed sum­mery of his account.

Cloud Ser­vice Provider(Administrator)

Admin­is­tra­tion of User and Data.Authority to Add/Remove user.

File Split­ting Map-Reduce Algo­rithm

Map-Reduce is a pro­gram­ming mod­el and an asso­ci­at­ed imple­men­ta­tion for pro­cess­ing and gen­er­at­ing large datasets that is amenable to a broad vari­ety of real-world tasks. Users spec­i­fy the com­pu­ta­tion in terms of a map and a reduce func­tion also Users spec­i­fy a map func­tion that process­es a key/value pair to gen­er­ate a set of inter­me­di­ate key/value pairs, and a reduce func­tion that merges all inter­me­di­ate val­ues asso­ci­at­ed with the same inter­me­di­ate key. Pro­grams writ­ten in this func­tion­al style are auto­mat­i­cal­ly par­al­lelized and exe­cut­ed on a large clus­ter of com­mod­i­ty machines. The run-time sys­tem takes care of the details of par­ti­tion­ing the input data, sched­ul­ing the pro­gram’s exe­cu­tion across a set of machines, han­dling machine fail­ures, and man­ag­ing the required inter-machine com­mu­ni­ca­tion. This allows pro­gram­mers with­out any expe­ri­ence with par­al­lel and dis­trib­uted sys­tems to eas­i­ly uti­lize the resources of a large dis­trib­uted system[7]. Map Reduce is a mas­sive­ly scal­able, par­al­lel pro­cess­ing frame­work that works in tan­dem with HDFS. With Map Reduce and Hadoop, com­pute is exe­cut­ed at the loca­tion of the data, rather than mov­ing data to the com­pute loca­tion; data stor­age and com­pu­ta­tion coex­ist on the same phys­i­cal nodes in the clus­ter. Map Reduce process­es exceed­ing­ly large amounts of data with­out being affect­ed by tra­di­tion­al bot­tle­necks like net­work band­width by tak­ing advan­tage of this data prox­im­i­ty [8].

Our imple­men­ta­tion of File Split­ting Map-Reduce Algo­rithm runs on a large clus­ter of com­mod­i­ty machines and is high­ly scal­able. Map-Reduce is Pop­u­lar­ized by open-source Hadoop project. Our File Split­ting Map-Reduce algo­rithm works on pro­cess­ing of large files by divid­ing them on num­ber of chunks and assign­ing the tasks to the clus­ter nodes in hadoop multin­ode con­fig­u­ra­tion. In these ways our pro­posed File Split­ting Map- Reduce algo­rithm improves the Uti­liza­tion of the Clus­ter nodes in terms of Time, CPU, and stor­age. Apply­ing a map oper­a­tion to each log­i­cal „record‟ in our input in order to com­pute a set of inter­me­di­ate key/value pairs, and then apply­ing a reduce oper­a­tion to all the val­ues that shared the same key, in order to com­bine the derived data appro­pri­ate­ly. Our use of a pro­gram­ming mod­el with user spec­i­fied map and reduce oper­a­tionsal­lows us to par­al­lelize large com­pu­ta­tions eas­i­ly [7]. It enables par­al­leliza­tion and dis­tri­b­u­tion of large scale com­pu­ta­tions, com­bined with an imple­men­ta­tion of this inter­face that achieves high per­for­mance on large clus­ters of com­mod­i­ty PCs.

Pro­gram­ming Mod­el

File Split­ting Map-Reduce Algo­rithm-

In this sce­nario clients is going to upload or down­load file from the main serv­er where the file split­ting map-reduce algo­rithm going to exe­cute. On main serv­er the map­per func­tion will pro­vide the list of avail­able clus­ter I/P address­es to which tasks are get assigned so that the task of files split­ting get assigned to each live clus­ters. File split­ting map-reduce algo­rithm splits file accord­ing to size and the avail­able clus­ter nodes.

The com­pu­ta­tion takes a set of input key/value pairs, and pro­duces a set of out­put key/value pairs. The user of Map-Reduce library express­es the com­pu­ta­tion as two func­tions: Map and Reduce[7].

Map, Writ­ten by user, takes an input pair and pro­duces a set of inter­me­di­ate key/value pairs. The Map- Reduce library groups togeth­er all inter­me­di­ate val­ues asso­ci­at­ed with the same inter­me­di­ate key and pass­es them to the Reduce function[7].

The Reduce func­tion, also writ­ten by the user, accepts the inter­me­di­ate key and a set of val­ues for that key. It merges togeth­er to these val­ues to form a pos­si­bly small­er set of val­ues. Typ­i­cal­ly just zero or one out­put val­ue is pro­duced per Reduce invo­ca­tion. The inter­me­di­ate val­ues are sup­plied to the user‟s reduce func­tion via an iter­a­tor. This allow us to han­dle lists of val­ues that are too large to fit in memory[7].

The map and reduce func­tions sup­plied by the user have asso­ci­at­ed types: Map (k1, v1) à list (k2, v2)

Reduce (k2, list (v2)) à list (v2)

It means the input keys and val­ues are drawn from a dif­fer­ent domain than the out­put keys and val­ues. Fur­ther­more, the inter­me­di­ate keys and val­ues are from the same domain as the out­put keys and values[7].

This process is auto­mat­ic Parallelization.Depending on the size of RAW INPUT DATA è instan­ti­ate mul­ti­ple MAP tasks.Similarly, depend­ing upon the num­ber of inter­me­di­ate <key, val­ue> par­ti­tions è instan­ti­ate mul­ti­ple REDUCE tasks. Map-Reduce data-par­al­lel pro­gram­ming mod­el hides com­plex­i­ty of dis­tri­b­u­tion and fault tol­er­ance.

Encryption/decryption for data secu­ri­ty by using RSA Algo­rithm

In this, file get encrypted/decrypted by using the RSA encryption/decryption algorithm.RSA encryption/decryption algo­rithm uses pub­lic key & pri­vate key for the encryp­tion and decryp­tion of data.Client upload the file along with some secrete/public key so pri­vate key is gen­er­at­ed &file get encrypt­ed. At the reverse process by using the pub­lic key/private key pair file get decrypt­ed and down­loaded.

Admin­is­tra­tion of client files(Third Par­ty Audi­tor)

This mod­ule pro­vides facil­i­ty for audit­ing all client files, asvar­i­ous activ­i­ties are done by Client. Files Log records and got cre­at­ed and Stored on Main Serv­er. For each reg­is­tered client Log record is get cre­at­ed which records the var­i­ous activ­i­ties like which oper­a­tions (upload/download) per­formed by client.Also Log records keep track of time and date at which var­i­ous activ­i­ties car­ried out by client. For the safe­ty and secu­ri­ty of the Client data and also for the audit­ing pur­pos­es the Log records helps.Also for the Admin­is­tra­tor Log record

facil­i­ty is pro­vid­ed which records the Log infor­ma­tion of all the reg­is­tered clients. So that Admin­is­tra­tor can con­trol over the all the data stored on Cloud servers.Administrator can see Client wise Log records which helps us to detect the fraud data access if any fake user try to access the data stored on Cloud servers.

IV Results

Our results of the project will be explained well with the help of project work done on num­ber of clients and one main serv­er and then three to five sec­ondary servers so then we have get these results bases on three para­me­ters tak­en into con­sid­er­a­tion like

  1. Time
  2. CPU Uti­liza­tion
  3. Stor­age Uti­liza­tion.

Our eval­u­a­tion exam­ines the improved uti­liza­tion of Clus­ter nodes i.e. Sec­ondary servers by upload­ing and down­load­ing files for HDFS ver­sus FTP from three per­spec­tives. First is improved time uti­liza­tion and sec­ond is improved CPU uti­liza­tion also the stor­age uti­liza­tion also get improved tremen­dous­ly.

Fig.5 time uti­liza­tion graph for upload­ing files Fig. 5 shows time uti­liza­tion for FTP and HDFS for upload­ing files. These are:


Results for time uti­liza­tion

Upload­ing File Size(in Mb)Time (in sec) for FTPTime (in sec) for HDFS
2102.5
317.57.5
4.22010
72712.5
Down­load­ing File Size(in Mb)Time (in sec) for FTPTime (in sec) for HDFS
2102.5
317.57.5
4.22010
72712.5

  Results for CPU utilization

Fig.7 CPU uti­liza­tion graph for FTP files Fig.7 describes the CPU uti­liza­tion for FTP files on num­ber of clus­tern­odes.

Fig.8 Describes CPU uti­liza­tion graph on Hadoop HDF­Son num­ber of Clus­ter nodes.

V Con­clu­sion

We have pro­posed improved cloud infra­struc­ture that com­bines On-Demand allo­ca­tion of resources with improved uti­liza­tion, oppor­tunis­tic pro­vi­sion­ing of cycles from idle cloud nodes to oth­er process­es. A cloud infra­struc­ture using Hadoop con­fig­u­ra­tion with improved CPU uti­liza­tion and stor­age uti­liza­tion is pro­posed using File split­ting Map-Reduce Algo­rithm. Hence all cloud nodes which remains idle are all get uti­lized and also improve­ment in secu­ri­ty chal­lenges and achieves load bal­anc­ing and fast pro­cess­ing of large data in less amount of time. We com­pare the FTP and HDFS for file upload­ing and file down­load­ing; and enhance the CPU uti­liza­tion and stor­age uti­liza­tion.

Till now in many pro­posed works, there is Hadoop con­fig­u­ra­tion for cloud infra­struc­ture. But still the cloud nodes remains idle. Hence no such work on CPU uti­liza­tion for FTP files ver­sus HDFS and stor­age uti­liza­tion for FTP files ver­sus HDFS, we did.

We eval­u­ate the back­fill solu­tion using an on-demand user work­load on cloud struc­ture using hadoop. We con­tribute to an increase of the CPU uti­liza­tion and time uti­liza­tion between FTP and HDFS. In our work also all cloud nodes are get ful­ly uti­lized , no any cloud remain idle, also pro­cess­ing of file get at faster rate so that tasks get processed at less amount of time which is also a big advan­tage hence improve uti­liza­tion.

References

[1].        Paul Mar­shall “Improv­ing Uti­liza­tion of Infra­struc­ture Clouds”,CO USA, IEEE/ACM Cloud Com­put­ing May 2011.

[2].        Shah, M.A., et.al., “Pri­va­cy-pre­serv­ing audit and extrac­tion of dig­i­tal con­tents”, Cryp­tol­ogy ePrint Archive, Report 2008/186 (2008).

[3].        Juels, A., Kalis­ki Jr., et al.; “proofs of retriev­abil­i­ty for large files”, pp. 584–597. ACM Press, New York (2007).

[4].        Aspera an IBM com­pa­ny (2014,07,14). Big Data Cloud[English].Available:http://cloud.asperasoft.com/big-data-cloud/.

[5].        Divyakant Agraw­al et al., “ Big Data and Cloud Com­put­ing: Cur­rent State and Future Oppor­tu­ni­ties” , EDBT, pp 22–24, March 2011.

[6].        The Apache Soft­ware Foundation(2014,07,14). Hadoop[English]. Avail­able:http://hadoop.apache.org/.

[7].  Jef­frey Dean et al., “MapRe­duce: sim­pli­fied data pro­cess­ing on large clus­ters”, com­mu­ni­ca­tions of the acm, Vol S1, No. 1, pp.107- 113, 2008 Jan­u­ary.

[8]. Clouders(2014,07,14).Cloudera[English].Available: http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/hdfs- and-mapreduce.html

[9].  Stack             overflow(2014,07,14).“Hadoop            Archi­tec­ture           Inter­nals:            use                    of                                                           job                          and                     task trackers”[English].Available:http://stackoverflow.com/questions/11263187/hadoop                              archi­tec­ture-inter­nals-use-of-job-and-task- track­ers

[10].  J. Dean et al.,“ MapRe­duce: Sim­pli­fied Data Pro­cess­ing on Large Clus­ters”, In OSDI, 2004

[11]. J. Dean et al., “MapRe­duce: Sim­pli­fied Data Pro­cess­ing on Large Clus­ters”, In CACM, Jan 2008. [12].  J. Dean et al., MapRe­duce: a flex­i­ble data pro­cess­ing tool”, In CACM, Jan 2010.

[13]. M. Stone­brak­er et al., “MapRe­duce and par­al­lel DBMSs: friends or foes?”, In CACM. Jan 2010. [14].  A.Pavlo et al., “A com­par­i­son of approach­es to large-scale data analy­sis”, In SIGMOD 2009.

[15]. A. Abouzeid et al., “HadoopDB: An Archi­tec­tur­al Hybrid of MapRe­duce and DBMS Tech­nolo­gies for Ana­lyt­i­cal Work­loads”, In VLDB 2009.

[16].      F. N. Afrati et al.,Opti­miz­ing joins in a map-reduce environment”,In EDBT 2010.

[17].      P. Agraw­al et al., “Asyn­chro­nous view main­te­nance for VLSD data­bas­es”, In SIGMOD 2009. [18].          S. Das et al., “Ricar­do: Inte­grat­ing R and Hadoop”, In SIGMOD 2010.

[19].      J. Cohen et al.,MAD Skills: New Analy­sis Prac­tices for Big Data”, In VLDB, 2009.

[20].      Gaizhen Yang et al., “The Appli­ca­tion of SaaS-Based Cloud Com­put­ing in the Uni­ver­si­ty Research and Teach­ing Plat­form”, ISIE,

pp. 210–213, 2011.

[21].      Paul Mar­shall et al., “Improv­ing Uti­liza­tion of Infra­struc­ture Clouds”, IEEE/ACM Inter­na­tion­al Sym­po­sium, pp. 205‑2014, 2011. [22].  Paul Mar­shall “Improv­ing Uti­liza­tion of Infra­struc­ture Clouds”,CO USA, IEEE/ACM Cloud Com­put­ing May 2011.

Leave a Comment

error

Enjoy this blog? Please spread the word :)