CENG521 performance of the spinlock algorithms including

CENG521 Final Exam answersName: Nourhan AbuzayedSt.

No.: 242630001PhD Student                  ????? 8 7 6 5 4 3  2 1 ???? Done 95% Done 95% Done 90% Done 100% Done 100% Done 100% Done 100% ?? ?????? ????? ??? ????? ????? ??? ????? ???????? ????? ????? ??????           ?????? %2   Questions(10points) Discuss the performance of the spinlock algorithms including naive,ticket, Anderson array based queuing, and MCS list based queuing by using the performanceresults provided in the paper we discussed in the lecture. Answer:?Lockis a Mechanism for a thread to make sure that when it is accessing some shareddata, it is not interfered by the other threads–Thread gets lock, does the work with data, then releases We have measured the performance of variousspin lock and barrier algorithms on the BBN Butterfly 1, a distributed sharedmemory multiprocessor, and the Sequent Symmetry Model B, a cache-coherent,shared-bus multiprocessor. (explain of figure 4: Performance of spin locks on the Butterfly (empty critical section))the performance on the Butterfly of thespin lock is as follows:as the number of processors increases,Thesimple test and set lock have the worst behavior in terms of timemeasurement. while, the ticket lock without backooff is a little bitfaster,because of the polling with a read instead of a Fetch-and ?.the anderson is much better than the previous two, while the MCS performs thebest.(explain Figure 5: Performance of selected spin locks on the Butterfly (empty criticalsection))As the number of processors increases, interms of time scale;The test and set algorithm have the worst behavior(maximumtime), with the time to acquire and release the lock increasing dramaticallyeven over this small range of processors, A simple test and set would performeven worse, (explain Figure 6 shows performance results for several spin lock algorithms on the Symmetry)Here data structures is adjusted in minorways to avoid the unnecessary invalidations which result from placing unrelateddata items in the same cache line.

Test and set algorithm shows the worstbehavior. with the time to acquire and release the lock increasing dramaticallyeven over this small range of processors. A simple test and set would performeven worse. Because the atomic add instruction on theSymmetry does not return the old value of its target location, implementationof the ticket lock is not possible on the Symmetry, nor is it possible toimplement Anderson?s lock directly. In his implementation , Anderson introduced an outer test and set lock with randomized backoto protectthe state of his queue. This strategy is reasonable when the critical sectionprotected by the outer lock namely acquisition or release of the inner lockissubstantially smaller than the critical sectionprotected by the inner lock This was not the casein our initial test so the graph in gure  actually results from processors contendingfor the outer lock instead of the inner queuebased lock (Mellor-Crummey & Scott, 1991) //——————————————————————————————————————- 2) (10 points) Discuss theperformance of the scheduling policies including First Come FirstServed, Fixed Processor, Last Processor, Minimum Intervening by using theperformance results provided in the paper we discussed in the lecture. Answer:Schedulingis assigning the processor to execute threads. ?Throughput: thenumber of threads completed in unit time.

  Figure2.1: throughput performance of FCFC,LP,FP (Squillante, Squillante, &Lazowska, 1993)Discsussion: As shown in Figure2.1, with light loads,the performance of LP and FCFS are the same, so FP better for small values of C,because of load imbalance problems of the FP policy. With higher loads,LP and FP are a slight different, as a result, FCFS dominates most values of C becauseof FCFS cache reload overhead.   Figure2.

2 throughput performance of MI,FCFS,LP,FP (Squillante et al., 1993)Discuson:As in Figure2.2. With light loads: MI better than LP,and as C increase the difference increase. This due to the use of more valuesfor X and T. and because of the similar affinity information with MI.

 With higher loads, MI outperforms LP.FCFS,as C increased, its throughput decrease. While FP doesn’t affect.   Figure2.3 variance in response time of MI,LMR,FCFS,LP,FP (Squillante et al., 1993)Discussion:The MIis worse than LP in terms of response time.

some tasks has excellent servicewhile others has poor. since MI makes locally optimal scheduling based oncache-reload transient penalty. Summary:greedy algorithms (MI,LP) have high throughputs and low response time, whilethey suffer from high variance in response time and latency with high loads(Squillante etal., 1993). ////—————————————————————————————————————-(Optional-20points) Propose a scheduling policy that can perform better than these fourpolicies.Answer:Inthis proposed solution, I suggest that; the task that has a shortest time isperformed before others. With taking into account that the other tasks can’t bepostponed more than a specified limit.

Ex. After (5) short jobs we can select ajob from all jobs that wait for a time and did not get its turn to be executed inorder to not let it wait forever or wait a long time in the case of all comingjobs are shorter than it, by this way we can guarantee fast execution of jobs,and prohibiting so long time jobs to wait a very long time. This solution mayensure fairness which is a disadvantage of greedy algorithms.Example:Supposewe have jobs arrived in the following sequence with their times task order 1 2 3 4 5 6 7 8 9 10 11 12 duration 10 2 1 1 4 11 2 4 2 1 3 8 Sort according to shortest 1 1 1 2 2 2 3 4 4 8 10 11  Theexecution as: 1 , 1,  1, 2, 2, 10,2, 2,  3, 4,4, 11, 8If weuse scheduling according to shortest, the 10, and 11 will wait too much (notfair)related to other short jobs, but in the proposed approach they areselected earlier (after 5 selection) //—————————————————————————————————— 3) (20 points) Give twoexamples for extensible kernels which provide flexible OS services. Explain howeach one achieves the flexibility and ensures the safety.

 Answer:Examples:Exokernel, SPIN and (L3 microkernel) Exokernel:Exokernelachieves flexibility by securely multiplexes hardware resources, usingAegis that supports group of OS primitive operations that make the system to bealtered. Also a high level of abstractions is used: virtual memory (VM), inter processcommunication (IPC). Also the implementation can be customized by employing theLibrary operating systems, which works upper the interface of the Exokernel, asa result; this can achieve higher level of abstractions.  Exokernel uses 3 methodsto export the resources securely. Which are secure bindings, visible resourcerevocation and the abort protocol. But the Safety is ensured by the securebindings which are applications that handle events and bind to machineresources securely.

There are 3 main techniques are used to apply securebindings: manage access to all hardware accurately, control authorizations toresources, employs Software TLB to cache secure bindings(Engler,Kaashoek, O ‘toole, ¤, & Jr., 1995). SPIN:SPIN achieves theflexibility by  providing kernelextensions to update or extend the operating system code. So the extensions arelinked and bined in a dynamic way. There is an event driven model for serviceupdate and customization.

SPIN’s Safety is ensuredby the facilities of Programming Language, And Extensions are verified bycompiler. Such that, the programming language that used is module 3, which is amodular language that support the advantage of binderies between interfacemodules. Colocationsupports logical domain protection, modularity and dynamic binding, which letinterfaces to be defined and accessed safely with small overhead (B. N.Bershad et al.

, 1995).//——————————————————————————————————— 4) (10 points) What is therole of Remote Interface in RMI? What does binding mean in RMI?Answer: Remote interface is Interface that declares the methodsof a remote object (methods that invoked from non-local virtual machine) suchthe clients of remote objects can interact according the remote interfaces, notduring the implementation classes of interfaces. So RMI is used in invoking amethod on a remote object (Wollrath, Riggs, &Waldo, 1996). Binding is the process to register a name for a remoteobject, that may be used later to look up that object. A remote object registrationcan be can be done using bind() or rebind() methods of the Naming class(Wollrath et al., 1996). Thus, before any first call, the client should bind to aserver interface.

When the bind occured, the client becomes authorized to accessthe interface procedures(Brian N. Bershad,Anderson, Lazowska, & Levy, 1990). //——————————————————————————————————- 5) (10 points) What are thesources of RPC (Remote Procedure Call) overhead? Explain briefly.Answer:Sourcesof the RPC OverheadMarshalingDatacopyingControltransferProtocolprocessing (Thekkath & Levy, 1993). Marshalingand Data copyingThree copies1– Client stub: (RPC stubs make illusion of procedural interface for theclient and server.

those procedures marshalling the arguments calls in and outof the message packet.)2– Kernel buffer: (map the user’sbuffer to the kernel)3 – DMA to controller: (in acontroller that scatter gather the DMA over the bus, the host marshals data inhost memory then the controller transfer data over the bus) Reduce to two copiesThe above three copies can be reduced to two copies, first;Marshal into kernel buffer directly or,second; shared descriptors about the argumentsbetween client stub and kernel. ControltransferFour context switches:  RPC call requires four context switches: 1- clientswitching out,2- server switching in, 3- server switching out, 4- client switchingback in.

Reduce to two switches:  two in the critical path of latency while theothers can be overlapped with network communicationReduce to one switch: oneof switches can be reduced in case of spinning instead of switch on in the clientside ProtocolProcessing high level of acknowledgement:so most messages are delivered Hardware check summing:to improve packet integrity No buffering in the clientside: because the client blocked for the callduration, retransmission data continues in client address space, similar tooriginal call. As a result there will be no latency in messages since nobuffering. Overlap server sidebuffering with result transmission in the network: sothere is no latency in the reply path(Thekkath & Levy, 1993).  //nnd may be need small word cheching(if I have time)or not  //——————————————————————————————————- 6) (20points) Compare Xen and ESX by explaining which one is more suitable for whatkind of usage scenarios. Answer: In general; Xen outperforms ESX Server in mostbenchmarks (Barham et al., 2003).    Esx(Waldspurger, 2002) Xen(Barham et al., 2003) Page Sharing ESX Server supports page sharing – allowing virtual machines to reduce memory consumption by sharing identical pages.

  – hosting ESX only monitors sharing memory within a single host. single virtual machine hosting a real OS that may multiplex 1000 of unmodified user level processes. Xen can host up to 100 virtual machine instances on modern server.   Hardware resources Multiplex hardware resources  Xen, can share multiple operating systems to share hardware safely. Suitable for Linux, Windows XP. reclaim memory ESX do memory reclamation by using Ballooning. So it can be used in linux and windows   – Memory Allocation policies ESX allow it   allowed Resource Allocation Resource rights distributed to clients using shares   allowed Idle memory tax clients have idle pages are penalized compared with active ones, when memory is small – Dynamic allocation Allowed by using thresholds   I/O page remapping ESX remap hot pages from high physical memory addresses to lower machine addresses. this reduce redundancy and copying page overhead   Supported by Windows, linux Linux and Windows XP, BSD   OS instantiation – In xen, users dynamically instantiate OS to execute what they want(Barham et al.

, 2003)  //——————————————————————————————————- 7) (10 points) Explain thedifferences between the operations of RVM (recoverable virtual memory) andVista RVM.Answer: Figure 7.1: operations of RVM system(Lowell & Chen, 1997)Asshown in the figure above, it do 3 copies for every transaction:undolog: a copy(old value) to in memory(duringtransaction)redolog: a copy(changes) to on-disk(duringtransaction)logtruncation: add minimum one extra copy for eachupdated data in the log. In RVM;crash recovery treated by replaying transactions commitment from the redo logto the DB. Figure7.

2: Vista Operations(Lowell & Chen, 1997) Asshown in figure 7.2; Vista operations use Rio file cache. Which has two mainoperations write(copy the data from application buffer to the file cache) andmmap(mapping part of file cache directly into the address space of application).

1.when a transaction starts, Vista copies the before images to the undo log 2.During the transaction, permanent data are written directly to the application 3.When transaction ends, undo log is discarded. Invista;Crash Recovery is treated similar abort using (Undo log)to survivecrashes because it is in RIO.

 Asclear from figures:Vista removes 2 operation of 3(that exist in RVM) ; it keepsundo log, it uses no redo log, no system calls, and only one memory to memorycopy(Lowell & Chen, 1997) //——————————————————————————————————- 8) (10 points) Explain howGMS (global memory system) algorithm handles the global memory hit and globalmemory miss cases.Answer:Whenfaulted page in the global memory of another node (Q), this is handled by: thedesired page in Q’s global memory is swapped with any global page in P’s globalmemory.  As a result, faulted pagebecomes a local page, the size of P’s local memory increases by one. Q’s memorybalance still as it is. (memory Hit).

 Whenfaulted page in the global memory of node Q, and P’s memory has only localpages. This can be handled by exchanging the LRU local page on P with thefaulted page on Q. The size of the global memory on Q andthe local memory on P are unchanged. ( memory Hit).

  When the page is on disk (outside the cluster).Read the faulted page into local memory of (node P). Also select the oldestpage in the cluster (in node Q) then write it to disk. Also, send any globalpage from P to global memory of Q. (memory miss). Whenthe faulted page is a shared page located in the local memory of another node (Q),this is handled by taking a copy of that faulted page into a frame on node P.and  selecting the oldest page in thecluster (node R) and writing it to disk.

Also sending a global page on node Pto a global memory of node R(Feeley et al., 1995). //—————————————— //nnd comment: ??? ?? ??? ??????(????? ????? ???) ?? ??? ?????????? ?????? ??????Pptfrom net with title: https://www.google.com.tr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwi3rK-GptLYAhUF1hQKHaMmCg0QFggyMAE&url=https%3A%2F%2Fcourses.

cs.washington.edu%2Fcourses%2Fcse451%2F00wi%2Flectures%2Fgms.ppt&usg=AOvVaw1EhYbLdUrmGZTjmiRP_0Ii //——————————————  ReferencesBarham, P., Dragovic,B., Fraser, K., Hand, S., Harris, T.

, Ho, A., … Warfield, A. (2003). Xen andthe art of virtualization. Proceedings of the Nineteenth ACM Symposium onOperating Systems Principles  – SOSP ‘03,164. https://doi.org/10.1145/945445.

945462Bershad, B. N.,Anderson, T. E.

, Lazowska, E. D., & Levy, H. M. (1990).

Lightweight remoteprocedure call. ACM Transactions on Computer Systems, 8(1),37–55. https://doi.org/10.1145/77648.77650Bershad, B. N.

,Savage, S., Pardyak, P., Sirer, E. G.

, Fiuczynski, M. E., Becker, D., … Eggers,S. (1995). Extensibility safety and performance in the SPIN operating system.

ACMSIGOPS Operating Systems Review, 29(5), 267–283.https://doi.org/10.1145/224057.

224077Engler, D. R.,Kaashoek, M. F., O ‘toole, J.

, ¤, ¢, & Jr., J. O. T. (1995).

Exokernel: AnOperating System Architecture for Application-Level Resource Management. ACMSIGOPS Operating Systems Review, 1(212), 251–266.https://doi.org/10.1145/224057.

224076Feeley, M. J., Morgan,W. E.

, Pighin, E. P., Karlin, A. R., Levy, H.

M., & Thekkath, C. A. (1995).Implementing global memory management in a workstation cluster. Proceedingsof the Fifteenth ACM Symposium on Operating Systems Principles  – SOSP ’95, 201–212.

https://doi.org/10.1145/224056.224072Lowell, D.

E., &Chen, P. M. (1997). Free transactions with Rio Vista. ACM SIGOPS OperatingSystems Review, 31(5), 92–101. https://doi.org/10.

1145/269005.266665Mellor-Crummey, J. M.

,& Scott, M. L. (1991). Algorithms for scalable synchronization onshared-memory multiprocessors. ACM Transactions on Computer Systems, 9(1),21–65. https://doi.org/10.

1145/103727.103729Squillante, M. S.

,Squillante, M. S., & Lazowska, E. D. (1993). Using Processor-Cache AffinityInformation in Shared-Memory Multiprocessor Scheduling. IEEE Transactions onParallel and Distributed Systems, 4(2), 131–143.

https://doi.org/10.1109/71.

207589Thekkath, C. A., &Levy, H. M. (1993). Limits to low-latency communication on high-speed networks.ACM Transactions on Computer Systems, 11(2), 179–203.

https://doi.org/10.1145/151244.151247Waldspurger, C. A.

(2002). Memory resource management in VMware ESX server. ACM SIGOPSOperating Systems Review, 36(SI), 181.https://doi.

org/10.1145/844128.844146Wollrath, A., Riggs,R., & Waldo, J.

(1996). A Distributed Object Model for the Java System. Confon ObjectOriented Technologies, 9(June), 265–290. Retrieved from http://www.usenix.org/publications/computing/9.4.html