Parallel the optimal reduced tour length. The

Parallel Simulated Annealing to Solve TSPAjinkya Wakhale, Sushant Chaudhari, Pavan kumar HanumantharayaUniversity of Maryland, Baltimore County, Department of CSEE1000, Hilltop Circle, Baltimore, Maryland, USA,21227December, [email protected] [email protected] [email protected]?— ?We implement a serial and parallel algorithm forsolving Travelling Salesman Problem(TSP) using simulatedannealing methodology. We propose to solve TSP by applyingMessage Passing Interface(MPI) in order to reduce the overalltime required to calculate the optimal solution with parallelsimulated annealing approach. In our algorithm, the number ofoperational executions is proportional to the number ofavailable nodes in the cluster.

This case study, analyses andcompares the performance of serial and parallel approachesfor solving TSP. It illustrates the decrease in execution timefor calculating best path and shows that parallel approach isbest-suited in case of high-end machines.Keywords?— Simulated Annealing, Travelling SalesmanProblem, Parallel Algorithm, Bluespeed Cluster, Multi-coresystems.I. INTRODUCTIONSimulated Annealing is a probabilistic technique namedafter metal heat treatment process to find the approximateglobal optimum of a given function. Temperature is the mostvital variable in annealing process and it determines theoverall energy state of the process. Simulated Annealingprovides a solution for optimization problems and focuses onconverting mathematical insights from the physics domain toreal world optimization problems. Simulated Annealing ofTSP approach starts with an arbitrary initial tour and changesfrom one tour to another when we encounter a new city.

Theresult will be the optimal reduced tour length.The Traveling Salesman Problem (TSP) is one of the classicproblems in combinatorial optimization research. “Given a setof cities and distance between every pair of cities, the problemis to find the shortest possible route that visits every cityexactly once and return to the starting city”18. Its solution iswidely used in diverse fields such as job sequencing, X-Raycrystallography, computer wiring, VLSI chip fabrication,vehicle routing 16. Any improved algorithms to findoptimized solutions to this problem will be adapted to anentire class of NP-complete problems.MPI is a kind of communication protocol for programmingparallel computers.

MPI provides a standard set of baseroutines that will be efficiently implemented in parallelcomputer hardware on a distributed system. It supports bothpoint-to-point and collective communication. MPI is one ofthe highly used model in high performance computation 13.MPI provides essential communication, virtual topology andsynchronization functionalities between set of differentprocesses.In this paper, we have implemented both serial and parallelversion of TSP using simulated annealing principles. Wedemonstrate, the significant reduction in execution time tofind the best path using MPI as communication interfacebetween nodes . We analyze the performance results of boththe implemented algorithms.The rest of the paper, we discuss related work, overview ofTSP, Simulated Annealing, Parallel Communication Interface,explain pseudocode and implementation of both serial andparallel algorithms, discuss several parameters andbenchmarks, compare serial performance with parallel andfinally present our conclusions.

II. ?RELATED WORKWe briefly review selected related work and previousalgorithms used for solving TSP using simulated annealingmethod.In 1982, Kirkpatrick 11 introduced simulated annealingand gave basic applications in obtaining solutions foroptimization problems. The aim was to develop a crystallattice such that it is the form of highly ordered silicon anddefect-free for semiconductor manufacturing.

The materialwas ‘annealed’ to achieve the end product and the process istermed as ‘Simulated Annealing’. It is a heuristic approachbased on Monte Carlo iterative solution method.TSP is combination optimization problem modelled as agraph problem to find the shortest possible route between thecities.

Simulated annealing is powerful and most effective insolving optimization problems such as TSP.In 1995, David S. Johnson and Lyle A. McGeoch 17applied Simulated annealing to TSP problem and gave thedetailed analysis of their results and possible speeduptechniques. They used Neighbourhood Pruning and a lowvalue to initialize temperature. They concluded that1combining neighbourhood pruning and low-temperature yieldsbetter results for Simulated annealing on TSP.In past few years, there were many pieces of research andstudies about different applications of TSP in discrete fields.In 2009, XuZhihong 14 implemented Ant colony algorithmto solve TSP using simulated annealing.

They proposedparallel methods to implement Ant colony algorithm usingMPI. They analyzed the performance difference between theparallel algorithm and traditional algorithm. In conclusion,they proposed a hybrid algorithm as a combination of antcolony and parallel simulated annealing approach to solveTSP resulting in low-cost and high efficient results.III. OverviewTravelling Salesman Problem(TSP) is an NP-Hard problem,which means that there is no algorithm that gives us bestsolution to TSP problem. Although it is difficult to predict thebest solution, we often only want a good solution.

We alwaysworry about the inordinate amount of computing time andresources which a solution might take and hence we put ourfocus on finding a solution that takes less time and computingresource. To find a good solution quickly, we often usestochastic optimization methods. We select randomlygenerated routes and incrementally improve them in somemanner to find a better route.

The change depends on thealgorithm used to solve a problem like TSP.A common approach to solve TSP is to think of solutionsexisting within a landscape. We can move from one solutionto another in search of better solutions. One methodology isHill Climbing method. Hill climbing algorithm which is agreedy algorithm first picks up a place to start and then take asolution which is uphill in our landscape and stop when thereare no more uphill steps. This means that in Hill climbingapproach, it is pretty quick to reach the top of the hill in ourlandscape but it depends on the initial starting position.

Thebiggest hill in solution landscape is global maximum and topof any other hill is local maxima. Standard hill climbing getsstuck at top of local maxima.Simulated Annealing is the modified version of hillclimbing.

It has the better chance of finding the globalmaxima in the solution landscape. As we know hill climbingalgorithm sometimes get stuck at the local optimal solution,simulated annealing tries to address this problem by moving inboth the direction in solution landscape. i.e both uphill anddownhill. So. simulated annealing is basically a hill climbing,but with an ability to go downhill.

In Simulated annealing, for each iteration creates aneighbour by changing some parameters of the solution andthen checks its neighbour function value against the previoussolution. To find the approximate solution for a problem likeTSP the number of iterations required is huge. Thissignificantly burdens the resources and increases the timecomplexity of the problem. Hence the parallelized version ofSimulated annealing yields better results in terms of total timetaken for an input. To effectively parallelize an algorithmbetween processes, selecting inter-process communicationmodel is very important.

One of the best light-weightedmodels for inter-process communication is MPI. It is alow-level parallel programming library but it provides severalinbuilt methods which help in parallel distribution of workamong the nodes in the cluster. In MPI, the applicationcommunicates with other processes using messages in order toperform a task.IV. ImplementationSerial SA Algorithm for TSPInput:n = number of city count.T = Initial Temperature 0Tk = The temperature at k-th iteration of simulated annealingCity Co-ordinates(s): set of given city co-ordinates in a tourNeighbor State (s?): City obtained by a random switch of cityarray.City Cost Matrix(C): total Euclidian distance between all thecitiesDifference in Cost ( ? ): the relative difference in cost cbetween s and s?Cooling Constant ( ? ): The rate at which temperature isdecreased in each iteration.

Probability Function (P): probability of selecting costly stateand to avoid staying in the local maximaTk+1 = ?Tk, where ? is some constant between 0 and 1P( ?, T ) = for >0 k { e ,??Tk ? > 0 TkP( ?, Tk) = { 1, ? <= 0 for Tk >0When ? > 0, for any given T, P is greater for smaller valuesof ? .Output?:B = best solution.time = Total time for calculating the approximate bestsolution.1) Choose a random city s and define T0 and ?2) Create a new state s? by randomly swapping two cities incity matrix3) Compute ? = C(s)C(s?) ? C(s)a. If ? <= 0, then s = s?b.

If ? > 0, then assign s = s? with probabilityP( ?, T ) ki. Compute Tk+1 = ?Tk and increment k4) Repeat the steps 2 and 3 keeping track of the best solutionuntil terminating conditions are met.Terminating Conditions: The algorithm terminates once wereach the goal cost and specified minimum temperature.Terminating condition plays a very important in simulatedannealing since we need to stop as soon as we reach the global2optimum. Hence, selection of cooling factor and initialtemperature is very important.We have implemented parallel TSP using SimulatedAnnealing approach with MPI as the communication protocolbetween the processes. The algorithm for the parallel versionis as below:Parallel SA Algorithm for TSPInput?: The input remains the same as serial implementationonly the computation and communication is in MPI channels.

1) Initialize MPI2) Choose a random city s and define T0 and ?3) Create a new state s? by randomly swapping two cities incity matrix4) Selection of root process and division of workloadif(Process_Id == 0)://this is a master processMPI_Scatter(city_distance,size_per_process,MPI_INT,weights_per_process,size_per_process,MPI_INT,0,MPI_Comm_world);5) Compute ? = C(s)C(s?) ? C(s)a. If ? <= 0, then s = s?b. If ? > 0, then assign s = s? with probabilityP( ?, T ) ki. Compute Tk+1 = ?Tk and increment k6) Gather local minima calculated in each process back to theroot.MPI_Gather(min_weight, cities_per_process, MPI_INT,global_min_weight, cities_per_process, MPI_INT, 0,MPI_Comm_world);7) Pick the best minimum solution among all the gatheredlocal minima from the worker processes.Terminating conditions remains the same as serialimplementation in each process.The time consumed to calculate the best path inparallel implementation is dependent on the number ofavailable processes in the cluster. As the number of processesincreases the time consumed decreases.

Once, processes reachthe threshold value the expected time increases due to morecommunication between the processes. Hence, optimalselection of processes is vital.V. DiscussionConsider the fig. 1 15 which shows two localminima A and B. B is the global maxima in this case and ourgoal is to reach the global maxima and avoid getting stuck atlocal maxima A.

If we place a ball randomly at any point inthe landscape and allow it to travel only downhill, has equalprobabilities of reaching A or B. To increase the probability ofthe ball to reach B, we can use shaking to simulate thetemperature in the entire process of reaching the goal.fig 1.

Energy landscapeShaking helps the ball to cross from A to B. But weneed to make sure that the shaking is not so furious that it goesin the opposite direction, further away from the globalmaxima. So the intensities of shaking are large in the initialstages and as we reach towards the goal it is reduced. This willincrease the probability of the ball to reach from A to B andreduce the probability from B to A. We are using the sameconcept in choosing our parameters for implementing theparallel version of the algorithm.

We choose initial temperature T = , where 0 ?Emax?Emax is the maximum difference in the cost that existsbetween any two neighbouring tours. This helps us to ensurethat all the possible paths are explored until the temperaturereduces to a final low temperature. We choose the finaltemperature very low but not as zero, to omit the chances ofaccepting a worse solution. We select the suitable lowtemperature such that the algorithm reaches a state where itcan no longer accept a better or worst solution.The cooling rate should allow the algorithm to runenough number of iterations at each temperature to helpstabilise the system. So we choose to reduce the temperaturelinearly with the help of a cooling constant. In our algorithm,we choose less number of iterations at higher temperaturesand more number of iterations at lower temperatures.

This isimportant to completely explore the local optimums and notconfuse it with global optima.VI. Demonstration System ArchitectureWe tested both the serial and parallel versions of ourcode on IBM MINSKY cluster, also called as Bluespeed.Bluespeed is built on next-gen IBM 200 Petaflops summitprocessor with two nodes of IBM Power S822LC.

.Itintegrates with dual power 8 plus processors with 10 cores at3.0 GHz.It has the capacity of 30 TERABYTES of FlashRAM whereas each node has 5 TERABYTE internal storage,1 TB of RAM. CPU architecture is 64-bit POWER8 with 2Socket(s), 10 Core(s) per socket and 8 Threads per core. TotalProcessor to memory bandwidth is 115 GB/sec.193VII. Analysis and ResultsOur experiments show that initially when the numberof processes is one, the time taken by the serial and parallelalgorithm is almost the same.

But as we increase the numberof processes, the execution time of the parallel algorithmsignificantly reduces.To analyze the overall complexity and compare the results forboth serial and parallel version, we used weak scaling andStrong scaling.In weak scaling, we compare both serial and parallel versionfor the different number of inputs and then we compare theirrunning time. In our case, to vary the input, we increase thetotal number of cities to both the program. For parallelversion, we fixed the total number of processes to 8 and thenwe analyze the results.fig 2.

Weak scalingFrom fig 2, it can be inferred that total running time reducessignificantly for the parallel version. When the number ofcities increases above 15,000 the speedup is almost 3 to 4times.In strong scaling, we analyze the parallel version of theprogram, keeping the total number of cities constant to 5000.Then we compare the total running time of the program byincreasing the total number of processors.fig 3.Strong ScalingFig. 3 demonstrates that the overall running time reducessignificantly for the parallel program when we increase thetotal number of processes.

The reduction in total running timeis constant as we approach higher number of processes, butrunning time increases when the total number of processesequals 20. This is primarily due to the reason that as thenumber of processes increases, there is an overhead ofinterprocess communication.IX. ConclusionIn this paper, we have presented and implementedboth serial and parallel approaches to solve TSP usingSimulated annealing. We have speed up the execution ofwhole simulated annealing algorithm by using MPI forparallel communication.

We have designed the MPI structuresuch that it will be easily replaced with any other parallelcommunication interface. We have shown the analysis andperformance comparison of serial and parallelimplementation. We have illustrated that with equaldistribution of city groups and proper initial values forsimulated annealing, the time required for generating the besttour is proportional to the number of processes in the cluster.Our experimental results on Bluespeed cluster have shownthat our parallel TSP algorithm is very fast compared to otherapproaches.AcknowledgementWe are grateful to Dr.

Alan Sherman for giving ustimely feedback and helpful comments.4REFERENCES1P. Jaillet. ProbabilisticTravelingSalesmanProblems. Ph.

D. Dissertation, MIT, Cambridge MA, USA., 1985.2J. Breckling,Ed.,TheAnalysisofDirectionalTimeSeries: D.J.

Bertsimas, L. Howell. Further Results on the Probabilistic Traveling Salesman Problem. European Journal of Operational Research 65, 68–95.

1993.3NeillE. Bowler,ThomasM. A.

Fink,andRobinC. Ball. Characterization of the probabilistic traveling salesman problem. Physical Review E, 68:036703, 2003.4M.

Jünger,G. Reinelt,andG. Rinaldi,”Thetravelingsalesmanproblem,”inNetworkModels,M. Ball,T.

Magnanti,C. Monma,andG. Nemhauser,Eds. Amsterdam,TheNetherlands: North-Holland,1995,pp. 225–330.5Delin,Luo.,Lixiao,Zhang.

,Zhihui,Xu. (August,2011). “HeuristicsimulatedannealinggeneticalgorithmforTravellingSalesmanProblem”. RetrievedonOctober08,20176L. Bianchi.

Antcolonyoptimizationandlocalsearchfortheprobabilistictravelingsalesmanproblem: acasestudyinstochasticcombinatorialoptimization. PhDthesis,Univ. LibredeBruxelles,Brussels,Belgium,2006.7Jeong,C.S.,Kim,M.

H. (n.d).

“FullParallelSimulatedAnnealingforTravellingSalesmanProblem”. RetrievedonOctober08,20178Li,Yang.,Zhou,Aimin.,Zhang,Guixu.

(2011). “SimulatedAnnealingwithProbabilisticNeighborhoodforTravellingSalesmanProblems”.9Liu,Xiaojun.,Zhang,Bin.,Du,Fangying. (2014).

“IntegratingRelativeCoordinateswithSimulatedAnnealingtoSolveaTravellingSalesmanProblem”.10Malek,Miroslaw.,Guruswamy,Mohan.,Pandya,Mihir. (1989).

“SerialandParallelSimulatedAnnealingandTabusearchAlgorithmsforTravellingSalesmanProblems”.11Rutenbar,Rob.A. (n.d). “SimulatedAnnealingAlgorithms: AnOverview”. RetrievedonOctober08,201712RajeshMatai,SuryaSinghandMurariLalMittal(2010).

TravelingSalesmanProblem: anOverviewofApplications,Formulations,andSolutionApproaches,TravelingSalesmanProblem,TheoryandApplications.13Sur,Sayantan.,Koop,Matthew.,Panda,Dhabaleswar.

(2006). “High-performance and scalable MPI over InfiniBand with reduced memory usage: an in-depth performance analysis”.14XuZhihong.,SongBo.,GuoYanyan. (2009). “UsingSimulatedAnnealingandAntColonyHybridAlgorithmtoSolveTravellingSalesmanProblem”.15G.

E. HintonandT.J.

Sejnowski,LearningandrelearninginBoltzmarmmachines,in: ParallelDistributedProcessing,Vol. 1(MITPress,1986).16″ApplicationsofCombinatorialoptimization,”talkatthe13thInternationalMathematicalProgrammingSymposium,Tokyo,198817DavidS. Johnson,LyleA.

McGeoch(1995). “TheTravelingSalesmanProblem: AcasestudyinLocalOptimization18″TravellingSalesmanProblem”(n.d).

RetrievedfromWikipedia19″CHMPRLABatUMBC”https://chmpr.umbc.edu/bluespeed”5