Novel Method to Improve ACO Performance on the GPU Using CUDA for Nurse Roster Scheduling Problem

Copy­right Notice & Dis­claimer

© Atul Patil, 2016. All rights reserved. This arti­cle, titled “Nov­el Method to Improve ACO Per­for­mance on the GPU Using CUDA for Nurse Ros­ter Sched­ul­ing Prob­lem”, was authored and pub­lished by Atul Patil. It was orig­i­nal­ly fea­tured in the Inter­na­tion­al Jour­nal of Advanced Research in Com­put­er and Com­mu­ni­ca­tion Engi­neer­ing (IJARCCE), ISSN (Online): 2278–1021, ISSN (Print): 2319–5940, Vol­ume 5, Issue 3 (March 2016). The orig­i­nal pub­li­ca­tion can be accessed at https://www.researchgate.net/publication/312372026_Novel_Method_to_Improve_ACO_Performance_on_the_GPU_Using_CUDA_for_Nurse_Roster_Scheduling_Problem.

Dis­claimer: This arti­cle is repub­lished here by the orig­i­nal author, Atul Patil, in accor­dance with the copy­right poli­cies of the Inter­na­tion­al Jour­nal of Advanced Research in Com­put­er and Com­mu­ni­ca­tion Engi­neer­ing (IJARCCE). The con­tent remains unchanged to pre­serve its orig­i­nal­i­ty. For any inquiries or copy­right-relat­ed mat­ters, please con­tact the author direct­ly.

Abstract: This paper shows the accom­plish­ment of par­al­lel Ant Colony Opti­miza­tion algo­rithm on the Graph­ics Pro­cess­ing Unit (GPU) to solve nurse ros­ter sched­ul­ing prob­lem (NRSP).We put on the Sched­ule for­ma­tion and pheromone update phas­es of Ant colony Opti­miza­tion using a data par­al­lel method. We applied roulette wheel selec­tion method for sched­ule for­ma­tion and pheromone update. The par­al­lel accom­plish­ment of roulette wheel selec­tion method con­sid­er­ably cuts the exe­cu­tion time of Sched­ule for­ma­tion. Our new par­al­lel accom­plish­ment exe­cutes up to 8–12x faster than sequen­tial exe­cu­tion at the same time as pre­serv­ing the qual­i­ty of the Sched­ules for­ma­tion.

Key­word: CUDA, GPU, NRSP, NVDIA, ACO

I.  INTRODUCTION

Ant Colony Opti­miza­tion (ACO) [1] is a well-known pop­u­la­tion-based algo­rithm for mod­el­ling and solve dis­crete opti­miza­tion prob­lems. Ant algo­rithms mod­el the com­port­ment of real ants to elu­ci­date diver­si­ty of opti­miza­tion and dis­sem­i­nat­ed con­trol prob­lem. We applied Ant Colony Opti­miza­tion algo­rithm to solve Nurse Ros­ter Sched­ul­ing Prob­lem (NRSP) where the main objec­tive is to attain the opti­mum sched­ule solu­tion about a set of sched­ules. The eas­i­est accom­plish­ment of Ant Colony Opti­miza­tion con­sists of two core phas­es one is Sched­ule con­struc­tion and sec­ond is pheromone update. To improve qual­i­ty of sched­ules, the addi­tion­al local search stage also be applied after sched­ules have been con­struct­ed before accom­plish­ment of the pheromone update phase. The method of Sched­ule for­ma­tion and pheromone update is oper­at­ed iter­a­tive­ly until a ces­sa­tion require­ment is meet up. The indi­rect com­mu­ni­ca­tion of is can be obtained using a pheromone matrix. Each ant has formed a new sched­ule and update pheromone matrix. It will impact con­sec­u­tive rep­e­ti­tions of the algo­rithm and the addi­tion­al com­pu­ta­tion time required for sched­ule for­ma­tion as the num­ber of Nurs­es and num­ber of day‟s increas­es, hence requires sig­nif­i­cant CPU time. So to improve com­pu­ta­tion­al time we imple­ment­ed par­al­lel ACO with roulette wheel selec­tion method, the Sched­ule for­ma­tion and pheromone update phas­es are per­formed indi­vid­u­al­ly for each ant which builds Ant Colony Opti­miza­tion (ACO) remark­ably appro­pri­ate to GPU par­al­leliza­tion.

The first key pro­ce­dure for imple­ment­ing ACO in par­al­lel man­ner where each ant assigns to an indi­vid­ual pro­cess­ing ele­ment and organ­is­es a colony of ants. In sec­ond key pro­ce­dure where intact colony of ants to a dis­pen­sa­tion ele­ment usu­al­ly improved with a method of inter con­nect­ing between the colonies. The Man­i­fold colonies are accom­plished in par­al­lel, poten­tial­ly dimin­ish­ing the

num­ber of rep­e­ti­tions a for deter­mi­na­tion. For par­al­lel pro­gram­ming, NVIDIA CUDA is a pro­gram­ming archi­tec­ture for emerg­ing gen­er­al pur­pose appli­ca­tions for exe­cu­tion [2]. Com­pute Uni­fied Device Archi­tec­ture expos­es the GPU‟s enor­mous­ly par­al­lel build­ing so that par­al­lel code can be writ­ten to accom­plish sig­nif­i­cant­ly faster than its opti­mized sequen­tial equiv­a­lent.

The par­al­lel ACO imple­men­ta­tions on the GPU using Com­pute Uni­fied Device Archi­tec­ture focus on both the imple­men­ta­tion of the algo­rithm and the supe­ri­or­i­ty of the solu­tions. Data par­al­lel approach is applied to exe­cute both phas­es of Ant Colony Opti­miza­tion in par­al­lel on the Graph­ics Pro­cess­ing Unit. For the Sched­ule for­ma­tion phase, our method uses a new par­al­lel imple­men­ta­tion of the roulette wheel selec­tion algo­rithm which is called DS- Roulette. DS-Roulette con­ducts the mod­ern hard­ware archi­tec­ture, extends par­al­lelism, and reduces the exe­cu­tion time. For the pheromone update phase, we incor­po­rate the method­ol­o­gy of MAX and MIN Ant colony Sys­tem, and linked with our accom­plish­ment.

II.  RELATED WORK

ACO algo­rithms can be cat­e­go­rized as coarse grained or fine grained con­sid­er­ing par­al­lel imple­men­ta­tion. The ants are indi­vid­u­al­ly rep­re­sent­ed to pro­cess­ing ele­ments with com­mu­ni­ca­tion between pro­cess­ing ele­ments being ant to ant is the fine grained method, and entire colonies are rep­re­sent­ed to pro­cess­ing ele­ments with com­mu­ni­ca­tion between colonies to colony is course grained method. This sec­tion stud­ies the exist­ing par­al­lel Ant colony Opti­miza­tion tech­niques that focus the GPU. Cata­la et al.

[1] explains the first GPU accom­plish­ments of ACO direct­ed at exhibit­ing the Delin­quent. Their accom­plish­ments depend upon a direct Graph­ics pro­cess­ing unit using graph­ics mod­els to resolve gen­er­al- pur­pose prob­lems.

Jien­ing et al. [2] realised the MMAS algo­rithm to resolve the TSP. This work pub­lished pri­or to the edict of CUDA and their accom­plish­ment was com­pos­ite than its CPU equiv­a­lence. A par­al­lel Sched­ule con­struc­tion phase result­ed in a slight­ly bet­ter per­for­mance

Fu et al. [3] applied a par­al­lel MMAS for GPU to solve the Trav­el­ling Sales­man Prob­lem. Their tech­nique focused on more MATLAB imple­men­ta­tion and less on the GPU. They described a speedup, how­ev­er, their rel­a­tive CPU imple­men­ta­tion was MAT­LAB-based which is fun­da­men­tal­ly dawdling due to an inter­pret­ed lan­guage.

Zhu and Cur­ry [4] designed a Search algo­rithm using ant colony opti­miza­tion to resolve non-lin­ear func­tion prob­lems using CUDA. They re-count­ed per­for­mance nearby2.5x over the sequen­tial exe­cu­tion.

Bai et al. [5] explained a mul­ti­ple colony ver­sion of MMAS using coarse-grained CUDA to resolve the Trav­el­ling Sales­man Prob­lem. Each ant colony is rep­re­sent­ed to thread block and inside block each thread is rep­re­sent­ed to an ant. This method pro­duces Sched­ules with a fea­ture anal­o­gous to the CPU accom­plish­ment but the speedup re-count­ed up to 2x.

Weiss [6] devel­oped a par­al­lel form of Ant Min­er GPU, an addi­tion of the MMAS algo­rithm. The each ant inside the colony is sig­ni­fied to a sep­a­rate CUDA thread. He claims that, method join with the Ant Min­er GPU algo­rithm allows for con­sid­er­ing larg­er pop­u­la­tion size. All phas­es of the work are moved to the GPU to avoid cost­ly moves to and from the Graph­ics Pro­cess­ing Unit.

A. Data-par­al­lelism

The ACO algo­rithm for solv­ing the TSP on the GPU using CUDA is imple­ment­ed by Cecil­ia et al. [3]. The exist­ing task-based method of rep­re­sent­ing one ant per thread is fun­da­men­tal­ly not matched to the GPU. With a task-based method, each thread must store each ant‟s mem­o­ry. This method applic­a­ble for small Sched­ules but prob­lem­at­ic with larg­er Sched­ules because of lim­it­ed shared mem­o­ry avail­able. Oth­er tech­nique to use few­er threads per block, which reduces GPU usage or glob­al mem­o­ry used to store each ant‟s mem­o­ry. Itin­tense­ly decreas­es the per­for­mance of ker­nels. The task-based par­al­lelism is warp-branch­ing. The ants con­struct a Sched­ule, and the exe­cu­tion paths of ants are gen­er­al­ly dif­fer due to con­di­tion­al state­ments intrin­sic for using roulette wheel selec­tion on the out­put of the ran­dom pro­por­tion­al rule. All threads with­in the branch are seri­al­ized and exe­cute sequen­tial­ly until the branch­ing sec­tion is com­plete for warp branch­es, thus sig­nif­i­cant­ly delay­ing the branch­ing code per­for­mance. The warp diver­gence and mem­o­ry issues are avoid­ed byDa­ta par­al­lelism. Data par­al­lelism rep­re­sent­ing each ant to a thread block and all threads inside the thread block work in coop­er­a­tion to per­form a col­lec­tive task such as Sched­ule con­struc­tion. Here thread is respon­si­ble for a sin­gu­lar day and the like­li­hood of vis­it­ing a day can be cal­cu­lat­ed using a pro­por­tion­ate selec­tion method known as I‑Roulette [3] with­out branch­ing the warp. For imple­men­ta­tion of pheromone update phase on the GPU.The 5x speedup fac­tor is report­ed when both the

Sched­ule con­struc­tion and pheromone update phas­es are exe­cut­ed on the GPU. The major­i­ty of the exe­cu­tion time spent on the Sched­ule con­struc­tion phase.

III.  IMPLEMENTATIONS

Our par­al­lel imple­men­ta­tion of the ACO algo­rithm for exe­cu­tion on the GPU is pre­sent­ed in this sec­tion. Here data par­al­lel approach is imple­ment­ed for rep­re­sent­ing each ant in the colony to a thread block. To max­i­mize per­for­mance we exe­cut­ed each phase of the algo­rithm on the GPU.

The first phase of the algo­rithm con­structs the nurse data and assigns mem­o­ry and the rel­e­vant data struc­tures. For any giv­en nurs­es size n and days size d, the con­straints are loaded into a matrix, for every pair of dis­tinct nurse for each day. To store each ant‟s cur­rent Sched­ule and Sched­ule length ant mem­o­ry is allo­cat­ed. A pheromone matrix is ini­tial­ized on the GPU to store pheromone lev­els and a sec­ondary struc­ture called choice info is used to store the prod­uct of the denom­i­na­tor. After com­plet­ing ini­tial­iza­tion phase, using greedy search, the pheromone matrix is arti­fi­cial­ly scat­tered with a Sched­ule gen­er­at­ed.

  1. Sched­ule con­struc­tion
Pro­ce­dure Con­struct SolutionSchedule[1] = assign the ant on a ran­dom day for j = 2 to n — 1 dofor l = 1 to n doProb[l] =CalProb(Schedule [1 : j — 1],l) end-for Schedule[j]=RouletteWhlSelection(prob) end-forSchedule[n] = remain­ing daySched­ule cost =CalScheduleCost(Schedule) End  

This phase is applied repeat­ed­ly until a new Sched­ule is cre­at­ed. Algo­rithm shown in Fig­ure 1 gives details about Sched­ule con­struc­tion.

Fig­ure 1: Pseu­do code for con­struc­tion of solu­tion

After the ini­tial­iza­tion, the first inner for loop repeats n — 2 times to build an com­plete Sched­ule note that there are only n‑2 choic­es to make as once n‑2 days have been cho­sen. With­in the inner for-loop, the prob­a­bil­i­ty of mov­ing from the last vis­it­ed day to all oth­er pos­si­ble days is cal­cu­lat­ed. Cal­cu­lat­ing the prob­a­bil­i­ty con­sists of two stages: retriev­ing the val­ue of choice info[j][l] and check­ing if day l has already been vis­it­ed in the cur­rent iter­a­tion in which case the prob­a­bil­i­ty is set to 0. The next day to vis­it is select­ed using roulette wheel selec­tion.

TABLE I. ROULETTE WHEEL SELECTION

INPUTREDUCEDNORMALIZED RANGERANGE
0:10:10:1> 0:0 &<0:1
0:30:40:25> 0:1 &< 0:25
0:20:60:375> 0:25 &< 0:375
0:81:40:875> 0:375 &< 0:875
0:21:61:00> 0:875 &< 1:0

Roulette wheel selec­tion is illus­trat­ed in Table. I where 1 item has to be cho­sen from 5in pro­por­tion to the val­ue in the first col­umn labelled „input‟. To obtain cumu­la­tive totals reduce the set of input val­ues so reduced val­ues are nor­mal­ized so that the sum of all input val­ues nor­mal­izes to 1 and the por­tion of the roulette wheel cor­re­spond­ing to some item is cal­cu­lat­ed. Gen­er­a­tion of ran­dom num­ber is final step that is between 0.0 to 1.0 is the last step how­ev­er, the lin­ear nature of the algo­rithm is the diver­gence in con­trol flow, par­al­lel ran­dom num­ber gen­er­a­tion for thread syn­chro­niza­tion. The entire Sched­ule is stored in shared mem­o­ry. But for large instance, shared mem­o­ry is often exhaust­ed. To address these prob­lem we present Dou­ble-Spin Roulette(DS-Roulette) which is a high­ly par­al­lel roulette selec­tion algo­rithm that deeds warp-lev­el par­al­lelism, reduces shared mem­o­ry depen­den­cies, and decreas­es the over­all instruc­tion count which is per­formed by the GPU. In the sequen­tial imple­men­ta­tion of roulette wheel selec­tion, each ant con­structs a Sched­ule one day at a time and each ant is processed con­sec­u­tive­ly. For par­al­lel imple­men­ta­tion of Sched­ule con­struc­tion phase using a data-par­al­lel approach, each thread is assigned to each block so that m blocks occu­pied by mants.

  • Pheromone update

Pheromone update is the last stage of the ACO algo­rithm which con­sists of two phas­es, one is pheromone evap­o­ra­tion and sec­ond is pheromone deposit. The pheromone evap­o­ra­tion phase is small to par­al­lelize as all edges are evap­o­rat­ed. A sin­gle thread block is pro­pelled­whichas­signs each thread to an edge and reduces the val­ue using con­stant fac­tor. An over­lay­ing strat­e­gy is used to cov­er all edges. The sec­ond phase is pheromone deposit, which deposits a quan­ti­ty of pheromone for each edge belong­ing to a con­struct­ed Sched­ule for each ant. To ensure cor­rect­ness of the pheromone matrix the atom­ic oper­a­tions must be used because of each ant per­form this step is par­al­lel.

Atom­ic oper­a­tions are expen­sive as com­pu­ta­tion­al so alter­na­tive approach using scat­ter to gath­er trans­for­ma­tions is used. In this approach it removes the depen­den­cy on atom­ic oper­a­tions. To reduce the usage of atom­ic oper­a­tions and increase con­ver­gence speed, we imple­ment the pheromone update where each ant makes a sin­gle atom­ic in job on a mem­o­ry val­ue stor­ing the Sched­ule length. This sin­gle oper­a­tion per block allows the low­est Sched­ule val­ue to be saved with­out extra ker­nels. For the Sched­ule con­struc­tion phase we begin m thread blocks rep­re­sent­ing m ants where Sched­ule cost is equiv­a­lent to the deep­est over­all cost.

IV.  EXPRIMENTAL RESULTS

In this sec­tion, we sum­ma­rize the results obtained using above tech­nique on var­i­ous instances of the NRSP and results are com­pared to oth­er par­al­lel and sequen­tial imple­men­ta­tions. We use stan­dard ACO para­me­ters and reduce rate of evap­o­ra­tion from 0.5 to 0.1 on the pheromone matrix for both GPU and CPU imple­men­ta­tions. The reduced evap­o­ra­tion rate con­firms that the pheromone matrix still has a dequatepherom one to impact the Sched­ule con­struc­tion. For analy­sis our imple­men­ta­tion we used an NVIDIA GT 610 GPU and an Intel i3 CPU. Our imple­men­ta­tion was writ­ten C lan­guage and com­piled using the lat­est CUDA toolk­it and exe­cut­ed on oper­at­ing sys­tem win­dows 7 using Microsoft Visu­al Stu­dio.

A. Solu­tion Qual­i­ty

To cal­cu­late the supe­ri­or­i­ty of the Sched­ules formed, we com­pared the results of our GPU exe­cu­tion against an exist­ing CPU exe­cu­tion for the set num­ber of rep­e­ti­tions. Our new method was able to match and decrease the size of the Sched­ules con­struct­ed when using iden­ti­cal para­me­ters and num­ber of rep­e­ti­tions. Table 2 and Fig.2 illus­tra­tions an eval­u­a­tion of the aver­age supe­ri­or­i­ty of Sched­ules acquired through the exist­ing CPU and new GPU exe­cu­tion.

Table 2: Aver­age solu­tion qual­i­ty

 CPUCPU
Nurse Ros­ter InstanceAver­age Solu­tion Qual­i­tyAver­age Solu­tion Qual­i­ty
Instance 200.850.92
Instance 210.750.8
Instance 220.80.9
Instance 230.70.8
Instance 240.750.8

Fig­ure 3: Com­pare aver­age solu­tion qual­i­ty on cpu and gpu

Fig­ure 3: Exe­cu­tion time on CPU

Fig­ure 4: Exe­cu­tion time on GPU

The results shown in fig­ure 3 & 4 proves that with GPU imple­men­ta­tion with this nov­el method is 8–12x faster than the sequen­tial exe­cu­tion. The Sched­ule con­struc­tion Phase uses the more of the total exe­cu­tion time. The Sched­ule con­struc­tion phase uses a new effi­cient exe­cu­tion of roulette wheel selec­tion and it is able to fetch sim­i­lar speedups to oth­er algo­rithms lim­it­ed by the exe­cu­tion time of pro­por­tion­ate range. The pheromone update exe­cu­tion is between 1–9x time faster than the exist­ing CPU exe­cu­tion.

V.  CONCLUSIONS

In this paper, we imple­ment­ed a data-par­al­lel GPU exe­cu­tion of the ACO algo­rithm to solve nurse ros­ter sched­ul­ing prob­lem. We imple­ments both the con­struc­tion of Sched­ule and pheromone update phas­es on the GPU. The obtained result shows an exe­cu­tion speed up 8–12x faster than the exist­ing CPU exe­cu­tion. For large data sets, our algo­rithm han­dles share mem­o­ry effi­cient­ly as com­pared to exist­ing sys­tem. Over­all an effi­cient par­al­lel imple­men­ta­tion of roulette wheel selec­tion gives bet­ter per­for­mance  than  exist­ing  par­al­lel  and  sequen­tial

imple­men­ta­tion and we ensure that this par­al­lel imple­men­ta­tion of algo­rithm is more appro­pri­ate with oth­er heuris­tic prob­lem solv­ing areas.

REFERENCES

  • A. Cata­la, J. Jaen, and J. Mod­ili, “Strate­gies for accel­er­at­ing ant colony opti­miza­tion algo­rithms on graph­i­cal pro­cess­ing units,” in IEEE Con­gres on Evo­lu­tion­ary Com­pu­ta­tion (CEC), Sept. 2007,

pp. 492– 500.

  • W. Jien­ing, D. Jiankang, and Z. Chun­feng, “Imple­men­ta­tion of ant colonies algo­rithm based on GPU,” in Sixth Int. Conf. on Compter Graph­ics, Imag­ing and Visu­al­iza­tion (CGIV), Aug. 2009, pp. 50– 53.
  • J. Fu, L. Lei, and G. Zhou, “A par­al­lel ant colony opti­miza­tion algo­rithmwith GPU-accel­er­a­tion based on all-in-roulette selec­tion,” in ThirdInt. Work­shop on Advanced Com­pu­ta­tion­al Intel­li­gence (IWACI), Aug.2010, pp. 260–264.
  • J. M. Cecil­ia, J. M. Garc´ıa, A. Nis­bet, M. Amos, and M. Ujal­don, “Enhanc­ing data par­al­lelism for ant colony opti­miza­tion on GPUs,” J.Parallel Dis­trib. Com­put., vol. 73, no. 1, pp. 42–51, 2013.
  • W. Zhu and J. Cur­ry, “Par­al­lel ant colony for non­lin­ear func­tion opti­miza­tion with graph­ics hard­ware accel­er­a­tion,” in Proc. Int. Conf. on Sys­tems, Man and Cyber­net­ics, Oct. 2009, pp. 1803– 1808.
  • H. Bai, D. Ouyang, X. Li, L. He, and H. Yu, “MAX-MIN ant sys­te­mon GPU with CUDA,” in Fourth Int. Conf. on Inno­v­a­tive Com­put­ing, Infor­ma­tion and Con­trol (ICICIC), Dec. 2009, pp. 801– 804.
  • A. Del‘evacq, P. Delisle, M. Grav­el, and M. Kra­jec­ki, “Par­al­lel ant colony opti­miza­tion on graph­ics pro­cess­ing units,” J. Par­al­lel Dis­trib. Com­put., vol. 73, no. 1, pp. 52–61, 2013.
  • M. Dori­go, “Opti­miza­tion, learn­ing and nat­ur­al algo­rithms,” Ph.D. dis­ser­ta­tion, Dipar­tien­to di Elet­tron­ic, Politec­ni­co di Milano, Milan, Italy, 1992.
  • M. Man­frin, M. Birat­tari, T. St¨utzle, and M. Dori­go, “Par­al­lel ant colony opti­miza­tion for the trav­el­ing sales­man prob­lem,” in Fifth Int. Work­shop on Ant Colony Opti­miza­tion and Swarm Intel­li­gence (ANTS),ser. Lec­ture Notes in Com­put­er Sci­ence, M. Dori­go, L. M. Gam­bardel­la,
  • M. Birat­tari, A. Mar­ti­no­li, R. Poli, and T. St¨utzle, Eds., vol. 4150. Springer Ver­lag, 2006, pp. 224–234.
  • T. St¨utzle, “Par­al­leliza­tion strate­gies for ant colony opti­miza­tion,” inFifth Int. Conf. on Par­al­lel Prob­lem Solv­ing from Nature (PPSN- V). Springer-Ver­lag, 1998, pp. 722–731.
  • T. St¨utzle and H. H. Hoos, “MAX-MIN ant sys­tem,” Future Gen­er. Com­put. Syst., vol. 16, no. 9, pp. 889–914, Jun. 2000. [Online].

Avail­able: http://dl.acm.org/citation.cfm?id=348599.348603

  • D. Kirk and W.-M. W. Hwu, Pro­gram­ming Mas­sive­ly Par­al­lel Processors:A Hands-on Approach. San Fran­cis­co, CA, USA: Mor­gan Kauf­mann Pub­lish­ers Inc., 2010.
  • NVIDIA, “Inside Kepler,” http://developer.download. nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0642-

GTC2012-Inside-Kepler.pdf

  • Y. You, “Par­al­lel ant sys­tem for trav­el­ing sales­man prob­lem on GPUs,” GPUs for Genet­ic and Evo­lu­tion­ary Com­pu­ta­tion, GECCO, http://www.gpgpgpu.com/gecko 2009 (last accessed 29/01/2013).
  • W. W. Hwu, GPU Com­put­ing Gems Emer­ald Edi­tion. Mor­gan Kauf­mann, 2011.
  • W.-M. W. Hwu, GPU Com­put­ing Gems Jade Edi­tion. Mor­gan Kauf­mann, 2011.
  • M. Dori­go, “Ant Colony Opti­miza­tion — Pub­lic Soft­ware,” http://iridia.ulb.ac.be/˜mdorigo/A­CO/a­co-code/                        pub­lic- software.html (last accessed: 29/01/2013.
  • NVIDIA,          “CUDA           C           Pro­gram­ming            Guide,” http://docs.nvidia.com//cuda/cuda-c-programing-guide/index.html (last accessed 29/01/2013).

Leave a Comment

error

Enjoy this blog? Please spread the word :)