CENG521 performance of the spinlock algorithms including

CENG521 Final Exam answers

Name: Nourhan Abuzayed

St. No.: 242630001

PhD Student

 

 

 

 

 

 

 

 

 

?????

8

7

6

5

4

3

 2

1

????

Done 95%

Done 95%

Done 90%

Done 100%

Done 100%

Done 100%

Done 100%

??

??????

?????
??? ?????

?????
??? ?????
????????

?????
????? ??????

 

 

 

 

 

?????? %2

 

 

Questions

(10
points) Discuss the performance of the spinlock algorithms including naive,
ticket, Anderson array based queuing, and MCS list based queuing by using the performance
results provided in the paper we discussed in the lecture.

 

Answer:

?

Lock
is a Mechanism for a thread to make sure that when it is accessing some shared
data, it is not interfered by the other threads


Thread gets lock, does the work with data, then releases

 

We have measured the performance of various
spin lock and barrier algorithms on the BBN Butterfly 1, a distributed shared
memory multiprocessor, and the Sequent Symmetry Model B, a cache-coherent,
shared-bus multiprocessor.

(explain of figure 4: Performance of spin locks on the Butterfly (empty critical section))

the performance on the Butterfly of the
spin lock is as follows:

as the number of processors increases,The
simple test and set lock have the worst behavior in terms of time
measurement. while, the ticket lock without backooff is a little bit
faster,because of the polling with a read instead of a Fetch-and ?.
the anderson is much better than the previous two, while the MCS performs the
best.

(explain Figure 5: Performance of selected spin locks on the Butterfly (empty critical
section))

As the number of processors increases, in
terms of time scale;The test and set algorithm have the worst behavior(maximum
time), with the time to acquire and release the lock increasing dramatically
even over this small range of processors, A simple test and set would perform
even worse,

 

(explain Figure 6 shows performance results for several spin lock algorithms on the Symmetry

)

Here data structures is adjusted in minor
ways to avoid the unnecessary invalidations which result from placing unrelated
data items in the same cache line. Test and set algorithm shows the worst
behavior. with the time to acquire and release the lock increasing dramatically
even over this small range of processors. A simple test and set would perform
even worse.

Because the atomic add instruction on the
Symmetry does not return the old value of its target location, implementation
of the ticket lock is not possible on the Symmetry, nor is it possible to
implement Anderson?s lock directly. In his implementation

, Anderson 
introduced an outer test and set lock with randomized backoto protect
the state of his queue. This strategy is reasonable when the critical section
protected by the outer lock namely acquisition or release of the inner lockis
substantially smaller than the critical section

protected by the inner lock

 This was not the case
in our initial test so the graph in

gure  actually results from processors contending
for the outer lock instead of the inner queuebased lock

 

(Mellor-Crummey & Scott, 1991)

 

//——————————————————————————————————————-

 

2) (10 points) Discuss the
performance of the scheduling policies including First Come First
Served, Fixed Processor, Last Processor, Minimum Intervening by using the
performance results provided in the paper we discussed in the lecture.

 

Answer:

Scheduling
is assigning the processor to execute threads.

 

?Throughput: the
number of threads completed in unit time.

 

Figure
2.1: throughput performance of FCFC,LP,FP (Squillante, Squillante, &
Lazowska, 1993)

Discsussion:

As shown in Figure
2.1,

with light loads,
the performance of LP and FCFS are the same, so FP better for small values of C,
because of load imbalance problems of the FP policy. With higher loads,
LP and FP are a slight different, as a result, FCFS dominates most values of C because
of FCFS cache reload overhead.

 

Figure
2.2 throughput performance of MI,FCFS,LP,FP (Squillante et al., 1993)

Discuson:

As in Figure
2.2.

 With light loads: MI better than LP,
and as C increase the difference increase. This due to the use of more values
for X and T. and because of the similar affinity information with MI.

 With higher loads, MI outperforms LP.FCFS,
as C increased, its throughput decrease. While FP doesn’t affect.

 

 

Figure
2.3 variance in response time of MI,LMR,FCFS,LP,FP (Squillante et al., 1993)

Discussion:

The MI
is worse than LP in terms of response time. some tasks has excellent service
while others has poor. since MI makes locally optimal scheduling based on
cache-reload transient penalty.

 

Summary:
greedy algorithms (MI,LP) have high throughputs and low response time, while
they suffer from high variance in response time and latency with high loads(Squillante et
al., 1993).

 

////—————————————————————————————————————-

(Optional-20
points) Propose a scheduling policy that can perform better than these four
policies.

Answer:

In
this proposed solution, I suggest that; the task that has a shortest time is
performed before others. With taking into account that the other tasks can’t be
postponed more than a specified limit. Ex. After (5) short jobs we can select a
job from all jobs that wait for a time and did not get its turn to be executed in
order to not let it wait forever or wait a long time in the case of all coming
jobs are shorter than it, by this way we can guarantee fast execution of jobs,
and prohibiting so long time jobs to wait a very long time. This solution may
ensure fairness which is a disadvantage of greedy algorithms.

Example:

Suppose
we have jobs arrived in the following sequence with their times

task
order

1

2

3

4

5

6

7

8

9

10

11

12

duration

10

2

1

1

4

11

2

4

2

1

3

8

Sort
according to shortest

1

1

1

2

2

2

3

4

4

8

10

11

 

The
execution as: 1 , 1,  1, 2, 2, 10,
2, 2,  3, 4,4, 11, 8

If we
use scheduling according to shortest, the 10, and 11 will wait too much (not
fair)related to other short jobs, but in the proposed approach they are
selected earlier (after 5 selection)

 

//——————————————————————————————————

 

3) (20 points) Give two
examples for extensible kernels which provide flexible OS services. Explain how
each one achieves the flexibility and ensures the safety.

 

Answer:

Examples:
Exokernel, SPIN and (L3 microkernel)

 

Exokernel:

Exokernel
achieves flexibility by securely multiplexes hardware resources, using
Aegis that supports group of OS primitive operations that make the system to be
altered. Also a high level of abstractions is used: virtual memory (VM), inter process
communication (IPC). Also the implementation can be customized by employing the
Library operating systems, which works upper the interface of the Exokernel, as
a result; this can achieve higher level of abstractions.

 

Exokernel uses 3 methods
to export the resources securely. Which are secure bindings, visible resource
revocation and the abort protocol. But the Safety is ensured by the secure
bindings which are applications that handle events and bind to machine
resources securely. There are 3 main techniques are used to apply secure
bindings: manage access to all hardware accurately, control authorizations to
resources, employs Software TLB to cache secure bindings(Engler,
Kaashoek, O ‘toole, ¤, & Jr., 1995).

 

SPIN:

SPIN achieves the
flexibility by  providing kernel
extensions to update or extend the operating system code. So the extensions are
linked and bined in a dynamic way. There is an event driven model for service
update and customization.

SPIN’s Safety is ensured
by the facilities of Programming Language, And Extensions are verified by
compiler. Such that, the programming language that used is module 3, which is a
modular language that support the advantage of binderies between interface
modules.

Colocation
supports logical domain protection, modularity and dynamic binding, which let
interfaces to be defined and accessed safely with small overhead (B. N.
Bershad et al., 1995).

//———————————————————————————————————

 

4) (10 points) What is the
role of Remote Interface in RMI? What does binding mean in RMI?

Answer:

 

Remote interface is Interface that declares the methods
of a remote object (methods that invoked from non-local virtual machine) such
the clients of remote objects can interact according the remote interfaces, not
during the implementation classes of interfaces. So RMI is used in invoking a
method on a remote object (Wollrath, Riggs, &
Waldo, 1996).

 

Binding is the process to register a name for a remote
object, that may be used later to look up that object. A remote object registration
can be can be done using bind() or rebind() methods of the Naming class(Wollrath et al., 1996).

Thus, before any first call, the client should bind to a
server interface. When the bind occured, the client becomes authorized to access
the interface procedures(Brian N. Bershad,
Anderson, Lazowska, & Levy, 1990).

 

//——————————————————————————————————-

 

5) (10 points) What are the
sources of RPC (Remote Procedure Call) overhead? Explain briefly.

Answer:

Sources
of the RPC Overhead

Marshaling
Data
copyingControl
transferProtocol
processing (Thekkath & Levy, 1993).

 

Marshaling
and Data copying

Three copies

1
– Client stub: (RPC stubs make illusion of procedural interface for the
client and server. those procedures marshalling the arguments calls in and out
of the message packet.)

2– Kernel buffer: (map the user’s
buffer to the kernel)

3 – DMA to controller: (in a
controller that scatter gather the DMA over the bus, the host marshals data in
host memory then the controller transfer data over the bus)

 

Reduce to two copiesThe above three copies can be reduced to two copies, first;
Marshal into kernel buffer directly or,second; shared descriptors about the arguments
between client stub and kernel.

 

Control
transfer

Four context switches:  RPC call requires four context switches: 1- client
switching out,2- server switching in, 3- server switching out, 4- client switching
back in. Reduce to two switches:  two in the critical path of latency while the
others can be overlapped with network communicationReduce to one switch: one
of switches can be reduced in case of spinning instead of switch on in the client
side

 

Protocol
Processing

high level of acknowledgement:
so most messages are delivered

 

Hardware check summing:
to improve packet integrity

 

No buffering in the client
side: because the client blocked for the call
duration, retransmission data continues in client address space, similar to
original call. As a result there will be no latency in messages since no
buffering. Overlap server side
buffering with result transmission in the network: so
there is no latency in the reply path(Thekkath & Levy, 1993).

 

//nnd may be need small word cheching
(if I have time)or not

 

//——————————————————————————————————-

 

6) (20
points) Compare Xen and ESX by explaining which one is more suitable for what
kind of usage scenarios.

Answer:

 In general; Xen outperforms ESX Server in most
benchmarks (Barham et al., 2003).

 

 

Esx(Waldspurger, 2002)

Xen(Barham et al., 2003)

Page Sharing

ESX Server supports page
sharing – allowing virtual machines to reduce memory consumption by
sharing identical pages.
 

hosting

ESX only monitors sharing memory within a single host.

single virtual machine hosting a real OS that may multiplex
1000 of unmodified user level processes.
Xen can host up to 100 virtual machine instances on modern
server.
 

Hardware resources

Multiplex hardware resources

 Xen, can share
multiple operating systems to share hardware safely.
Suitable for Linux, Windows XP.

reclaim memory

ESX do memory reclamation by using Ballooning. So it can be
used in linux and windows
 

Memory Allocation policies

ESX allow it
 

allowed

Resource Allocation

Resource rights distributed to clients using shares
 

allowed

Idle memory tax

clients have idle pages are penalized compared with active
ones, when memory is small

Dynamic allocation

Allowed by using thresholds

 

I/O page remapping

ESX remap hot pages from high physical memory addresses to
lower machine addresses. this reduce redundancy and copying page overhead

 

Supported by

Windows, linux

Linux and Windows XP, BSD
 

OS instantiation

In xen, users dynamically instantiate OS to execute what
they want(Barham et al., 2003)

 

//——————————————————————————————————-

 

7) (10 points) Explain the
differences between the operations of RVM (recoverable virtual memory) and
Vista RVM.

Answer:

Figure 7.1: operations of RVM system(Lowell & Chen, 1997)

As
shown in the figure above, it do 3 copies for every transaction:

undo
log: a copy(old value) to in memory(during
transaction)

redo
log: a copy(changes) to on-disk(during
transaction)

log
truncation: add minimum one extra copy for each
updated data in the log.

 

In RVM;
crash recovery treated by replaying transactions commitment from the redo log
to the DB.

Figure
7.2: Vista Operations(Lowell & Chen, 1997)

 

As
shown in figure 7.2; Vista operations use Rio file cache. Which has two main
operations write(copy the data from application buffer to the file cache) and
mmap(mapping part of file cache directly into the address space of application).

1.
when a transaction starts, Vista copies the before images to the undo log

2.
During the transaction, permanent data are written directly to the application

3.
When transaction ends, undo log is discarded.

 

In
vista;Crash Recovery is treated similar abort using (Undo log)to survive
crashes because it is in RIO.

 

As
clear from figures:Vista removes 2 operation of 3(that exist in RVM) ; it keeps
undo log, it uses no redo log, no system calls, and only one memory to memory
copy(Lowell & Chen, 1997)

 

//——————————————————————————————————-

 

8) (10 points) Explain how
GMS (global memory system) algorithm handles the global memory hit and global
memory miss cases.

Answer:

When
faulted page in the global memory of another node (Q), this is handled by: the
desired page in Q’s global memory is swapped with any global page in P’s global
memory.  As a result, faulted page
becomes a local page, the size of P’s local memory increases by one. Q’s memory
balance still as it is. (memory Hit). When
faulted page in the global memory of node Q, and P’s memory has only local
pages. This can be handled by exchanging the LRU local page on P with the
faulted page on Q. The size of the global memory on Q and
the local memory on P are unchanged. ( memory Hit).  When the page is on disk (outside the cluster).
Read the faulted page into local memory of (node P). Also select the oldest
page in the cluster (in node Q) then write it to disk. Also, send any global
page from P to global memory of Q. (memory miss). When
the faulted page is a shared page located in the local memory of another node (Q),
this is handled by taking a copy of that faulted page into a frame on node P.
and  selecting the oldest page in the
cluster (node R) and writing it to disk. Also sending a global page on node P
to a global memory of node R(Feeley et al., 1995). 

//——————————————

 

//nnd comment: ??? ?? ??? ??????(????? ????? ???) ?? ??? ?????????? ?????? ??????

Ppt
from net with title: https://www.google.com.tr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwi3rK-GptLYAhUF1hQKHaMmCg0QFggyMAE&url=https%3A%2F%2Fcourses.cs.washington.edu%2Fcourses%2Fcse451%2F00wi%2Flectures%2Fgms.ppt&usg=AOvVaw1EhYbLdUrmGZTjmiRP_0Ii

 

//——————————————

 

 

References

Barham, P., Dragovic,
B., Fraser, K., Hand, S., Harris, T., Ho, A., … Warfield, A. (2003). Xen and
the art of virtualization. Proceedings of the Nineteenth ACM Symposium on
Operating Systems Principles  – SOSP ’03,
164. https://doi.org/10.1145/945445.945462

Bershad, B. N.,
Anderson, T. E., Lazowska, E. D., & Levy, H. M. (1990). Lightweight remote
procedure call. ACM Transactions on Computer Systems, 8(1),
37–55. https://doi.org/10.1145/77648.77650

Bershad, B. N.,
Savage, S., Pardyak, P., Sirer, E. G., Fiuczynski, M. E., Becker, D., … Eggers,
S. (1995). Extensibility safety and performance in the SPIN operating system. ACM
SIGOPS Operating Systems Review, 29(5), 267–283.
https://doi.org/10.1145/224057.224077

Engler, D. R.,
Kaashoek, M. F., O ‘toole, J., ¤, ¢, & Jr., J. O. T. (1995). Exokernel: An
Operating System Architecture for Application-Level Resource Management. ACM
SIGOPS Operating Systems Review, 1(212), 251–266.
https://doi.org/10.1145/224057.224076

Feeley, M. J., Morgan,
W. E., Pighin, E. P., Karlin, A. R., Levy, H. M., & Thekkath, C. A. (1995).
Implementing global memory management in a workstation cluster. Proceedings
of the Fifteenth ACM Symposium on Operating Systems Principles  – SOSP ’95, 201–212.
https://doi.org/10.1145/224056.224072

Lowell, D. E., &
Chen, P. M. (1997). Free transactions with Rio Vista. ACM SIGOPS Operating
Systems Review, 31(5), 92–101. https://doi.org/10.1145/269005.266665

Mellor-Crummey, J. M.,
& Scott, M. L. (1991). Algorithms for scalable synchronization on
shared-memory multiprocessors. ACM Transactions on Computer Systems, 9(1),
21–65. https://doi.org/10.1145/103727.103729

Squillante, M. S.,
Squillante, M. S., & Lazowska, E. D. (1993). Using Processor-Cache Affinity
Information in Shared-Memory Multiprocessor Scheduling. IEEE Transactions on
Parallel and Distributed Systems, 4(2), 131–143.
https://doi.org/10.1109/71.207589

Thekkath, C. A., &
Levy, H. M. (1993). Limits to low-latency communication on high-speed networks.
ACM Transactions on Computer Systems, 11(2), 179–203.
https://doi.org/10.1145/151244.151247

Waldspurger, C. A.
(2002). Memory resource management in VMware ESX server. ACM SIGOPS
Operating Systems Review, 36(SI), 181.
https://doi.org/10.1145/844128.844146

Wollrath, A., Riggs,
R., & Waldo, J. (1996). A Distributed Object Model for the Java System. Conf
on ObjectOriented Technologies, 9(June), 265–290. Retrieved from http://www.usenix.org/publications/computing/9.4.html