CENG521 Final Exam answers
Name: Nourhan Abuzayed
St. No.: 242630001
points) Discuss the performance of the spinlock algorithms including naive,
ticket, Anderson array based queuing, and MCS list based queuing by using the performance
results provided in the paper we discussed in the lecture.
is a Mechanism for a thread to make sure that when it is accessing some shared
data, it is not interfered by the other threads
Thread gets lock, does the work with data, then releases
We have measured the performance of various
spin lock and barrier algorithms on the BBN Butterfly 1, a distributed shared
memory multiprocessor, and the Sequent Symmetry Model B, a cache-coherent,
(explain of figure 4: Performance of spin locks on the Butterfly (empty critical section))
the performance on the Butterfly of the
spin lock is as follows:
as the number of processors increases,The
simple test and set lock have the worst behavior in terms of time
measurement. while, the ticket lock without backooff is a little bit
faster,because of the polling with a read instead of a Fetch-and ?.
the anderson is much better than the previous two, while the MCS performs the
(explain Figure 5: Performance of selected spin locks on the Butterfly (empty critical
As the number of processors increases, in
terms of time scale;The test and set algorithm have the worst behavior(maximum
time), with the time to acquire and release the lock increasing dramatically
even over this small range of processors, A simple test and set would perform
(explain Figure 6 shows performance results for several spin lock algorithms on the Symmetry
Here data structures is adjusted in minor
ways to avoid the unnecessary invalidations which result from placing unrelated
data items in the same cache line. Test and set algorithm shows the worst
behavior. with the time to acquire and release the lock increasing dramatically
even over this small range of processors. A simple test and set would perform
Because the atomic add instruction on the
Symmetry does not return the old value of its target location, implementation
of the ticket lock is not possible on the Symmetry, nor is it possible to
implement Anderson?s lock directly. In his implementation
introduced an outer test and set lock with randomized backoto protect
the state of his queue. This strategy is reasonable when the critical section
protected by the outer lock namely acquisition or release of the inner lockis
substantially smaller than the critical section
protected by the inner lock
This was not the case
in our initial test so the graph in
gure actually results from processors contending
for the outer lock instead of the inner queuebased lock
(Mellor-Crummey & Scott, 1991)
2) (10 points) Discuss the
performance of the scheduling policies including First Come First
Served, Fixed Processor, Last Processor, Minimum Intervening by using the
performance results provided in the paper we discussed in the lecture.
is assigning the processor to execute threads.
number of threads completed in unit time.
2.1: throughput performance of FCFC,LP,FP (Squillante, Squillante, &
As shown in Figure
with light loads,
the performance of LP and FCFS are the same, so FP better for small values of C,
because of load imbalance problems of the FP policy. With higher loads,
LP and FP are a slight different, as a result, FCFS dominates most values of C because
of FCFS cache reload overhead.
2.2 throughput performance of MI,FCFS,LP,FP (Squillante et al., 1993)
As in Figure
With light loads: MI better than LP,
and as C increase the difference increase. This due to the use of more values
for X and T. and because of the similar affinity information with MI.
With higher loads, MI outperforms LP.FCFS,
as C increased, its throughput decrease. While FP doesn’t affect.
2.3 variance in response time of MI,LMR,FCFS,LP,FP (Squillante et al., 1993)
is worse than LP in terms of response time. some tasks has excellent service
while others has poor. since MI makes locally optimal scheduling based on
cache-reload transient penalty.
greedy algorithms (MI,LP) have high throughputs and low response time, while
they suffer from high variance in response time and latency with high loads(Squillante et
points) Propose a scheduling policy that can perform better than these four
this proposed solution, I suggest that; the task that has a shortest time is
performed before others. With taking into account that the other tasks can’t be
postponed more than a specified limit. Ex. After (5) short jobs we can select a
job from all jobs that wait for a time and did not get its turn to be executed in
order to not let it wait forever or wait a long time in the case of all coming
jobs are shorter than it, by this way we can guarantee fast execution of jobs,
and prohibiting so long time jobs to wait a very long time. This solution may
ensure fairness which is a disadvantage of greedy algorithms.
we have jobs arrived in the following sequence with their times
according to shortest
execution as: 1 , 1, 1, 2, 2, 10,
2, 2, 3, 4,4, 11, 8
use scheduling according to shortest, the 10, and 11 will wait too much (not
fair)related to other short jobs, but in the proposed approach they are
selected earlier (after 5 selection)
3) (20 points) Give two
examples for extensible kernels which provide flexible OS services. Explain how
each one achieves the flexibility and ensures the safety.
Exokernel, SPIN and (L3 microkernel)
achieves flexibility by securely multiplexes hardware resources, using
Aegis that supports group of OS primitive operations that make the system to be
altered. Also a high level of abstractions is used: virtual memory (VM), inter process
communication (IPC). Also the implementation can be customized by employing the
Library operating systems, which works upper the interface of the Exokernel, as
a result; this can achieve higher level of abstractions.
Exokernel uses 3 methods
to export the resources securely. Which are secure bindings, visible resource
revocation and the abort protocol. But the Safety is ensured by the secure
bindings which are applications that handle events and bind to machine
resources securely. There are 3 main techniques are used to apply secure
bindings: manage access to all hardware accurately, control authorizations to
resources, employs Software TLB to cache secure bindings(Engler,
Kaashoek, O ‘toole, ¤, & Jr., 1995).
SPIN achieves the
flexibility by providing kernel
extensions to update or extend the operating system code. So the extensions are
linked and bined in a dynamic way. There is an event driven model for service
update and customization.
SPIN’s Safety is ensured
by the facilities of Programming Language, And Extensions are verified by
compiler. Such that, the programming language that used is module 3, which is a
modular language that support the advantage of binderies between interface
supports logical domain protection, modularity and dynamic binding, which let
interfaces to be defined and accessed safely with small overhead (B. N.
Bershad et al., 1995).
4) (10 points) What is the
role of Remote Interface in RMI? What does binding mean in RMI?
Remote interface is Interface that declares the methods
of a remote object (methods that invoked from non-local virtual machine) such
the clients of remote objects can interact according the remote interfaces, not
during the implementation classes of interfaces. So RMI is used in invoking a
method on a remote object (Wollrath, Riggs, &
Binding is the process to register a name for a remote
object, that may be used later to look up that object. A remote object registration
can be can be done using bind() or rebind() methods of the Naming class(Wollrath et al., 1996).
Thus, before any first call, the client should bind to a
server interface. When the bind occured, the client becomes authorized to access
the interface procedures(Brian N. Bershad,
Anderson, Lazowska, & Levy, 1990).
5) (10 points) What are the
sources of RPC (Remote Procedure Call) overhead? Explain briefly.
of the RPC Overhead
processing (Thekkath & Levy, 1993).
and Data copying
– Client stub: (RPC stubs make illusion of procedural interface for the
client and server. those procedures marshalling the arguments calls in and out
of the message packet.)
2– Kernel buffer: (map the user’s
buffer to the kernel)
3 – DMA to controller: (in a
controller that scatter gather the DMA over the bus, the host marshals data in
host memory then the controller transfer data over the bus)
Reduce to two copiesThe above three copies can be reduced to two copies, first;
Marshal into kernel buffer directly or,second; shared descriptors about the arguments
between client stub and kernel.
Four context switches: RPC call requires four context switches: 1- client
switching out,2- server switching in, 3- server switching out, 4- client switching
back in. Reduce to two switches: two in the critical path of latency while the
others can be overlapped with network communicationReduce to one switch: one
of switches can be reduced in case of spinning instead of switch on in the client
high level of acknowledgement:
so most messages are delivered
Hardware check summing:
to improve packet integrity
No buffering in the client
side: because the client blocked for the call
duration, retransmission data continues in client address space, similar to
original call. As a result there will be no latency in messages since no
buffering. Overlap server side
buffering with result transmission in the network: so
there is no latency in the reply path(Thekkath & Levy, 1993).
//nnd may be need small word cheching
(if I have time)or not
points) Compare Xen and ESX by explaining which one is more suitable for what
kind of usage scenarios.
In general; Xen outperforms ESX Server in most
benchmarks (Barham et al., 2003).
Xen(Barham et al., 2003)
ESX Server supports page
sharing – allowing virtual machines to reduce memory consumption by
sharing identical pages.
ESX only monitors sharing memory within a single host.
single virtual machine hosting a real OS that may multiplex
1000 of unmodified user level processes.
Xen can host up to 100 virtual machine instances on modern
Multiplex hardware resources
Xen, can share
multiple operating systems to share hardware safely.
Suitable for Linux, Windows XP.
ESX do memory reclamation by using Ballooning. So it can be
used in linux and windows
Memory Allocation policies
ESX allow it
Resource rights distributed to clients using shares
Idle memory tax
clients have idle pages are penalized compared with active
ones, when memory is small
Allowed by using thresholds
I/O page remapping
ESX remap hot pages from high physical memory addresses to
lower machine addresses. this reduce redundancy and copying page overhead
Linux and Windows XP, BSD
In xen, users dynamically instantiate OS to execute what
they want(Barham et al., 2003)
7) (10 points) Explain the
differences between the operations of RVM (recoverable virtual memory) and
Figure 7.1: operations of RVM system(Lowell & Chen, 1997)
shown in the figure above, it do 3 copies for every transaction:
log: a copy(old value) to in memory(during
log: a copy(changes) to on-disk(during
truncation: add minimum one extra copy for each
updated data in the log.
crash recovery treated by replaying transactions commitment from the redo log
to the DB.
7.2: Vista Operations(Lowell & Chen, 1997)
shown in figure 7.2; Vista operations use Rio file cache. Which has two main
operations write(copy the data from application buffer to the file cache) and
mmap(mapping part of file cache directly into the address space of application).
when a transaction starts, Vista copies the before images to the undo log
During the transaction, permanent data are written directly to the application
When transaction ends, undo log is discarded.
vista;Crash Recovery is treated similar abort using (Undo log)to survive
crashes because it is in RIO.
clear from figures:Vista removes 2 operation of 3(that exist in RVM) ; it keeps
undo log, it uses no redo log, no system calls, and only one memory to memory
copy(Lowell & Chen, 1997)
8) (10 points) Explain how
GMS (global memory system) algorithm handles the global memory hit and global
memory miss cases.
faulted page in the global memory of another node (Q), this is handled by: the
desired page in Q’s global memory is swapped with any global page in P’s global
memory. As a result, faulted page
becomes a local page, the size of P’s local memory increases by one. Q’s memory
balance still as it is. (memory Hit). When
faulted page in the global memory of node Q, and P’s memory has only local
pages. This can be handled by exchanging the LRU local page on P with the
faulted page on Q. The size of the global memory on Q and
the local memory on P are unchanged. ( memory Hit). When the page is on disk (outside the cluster).
Read the faulted page into local memory of (node P). Also select the oldest
page in the cluster (in node Q) then write it to disk. Also, send any global
page from P to global memory of Q. (memory miss). When
the faulted page is a shared page located in the local memory of another node (Q),
this is handled by taking a copy of that faulted page into a frame on node P.
and selecting the oldest page in the
cluster (node R) and writing it to disk. Also sending a global page on node P
to a global memory of node R(Feeley et al., 1995).
//nnd comment: ??? ?? ??? ??????(????? ????? ???) ?? ??? ?????????? ?????? ??????
from net with title: https://www.google.com.tr/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwi3rK-GptLYAhUF1hQKHaMmCg0QFggyMAE&url=https%3A%2F%2Fcourses.cs.washington.edu%2Fcourses%2Fcse451%2F00wi%2Flectures%2Fgms.ppt&usg=AOvVaw1EhYbLdUrmGZTjmiRP_0Ii
Barham, P., Dragovic,
B., Fraser, K., Hand, S., Harris, T., Ho, A., … Warfield, A. (2003). Xen and
the art of virtualization. Proceedings of the Nineteenth ACM Symposium on
Operating Systems Principles – SOSP ’03,
Bershad, B. N.,
Anderson, T. E., Lazowska, E. D., & Levy, H. M. (1990). Lightweight remote
procedure call. ACM Transactions on Computer Systems, 8(1),
Bershad, B. N.,
Savage, S., Pardyak, P., Sirer, E. G., Fiuczynski, M. E., Becker, D., … Eggers,
S. (1995). Extensibility safety and performance in the SPIN operating system. ACM
SIGOPS Operating Systems Review, 29(5), 267–283.
Engler, D. R.,
Kaashoek, M. F., O ‘toole, J., ¤, ¢, & Jr., J. O. T. (1995). Exokernel: An
Operating System Architecture for Application-Level Resource Management. ACM
SIGOPS Operating Systems Review, 1(212), 251–266.
Feeley, M. J., Morgan,
W. E., Pighin, E. P., Karlin, A. R., Levy, H. M., & Thekkath, C. A. (1995).
Implementing global memory management in a workstation cluster. Proceedings
of the Fifteenth ACM Symposium on Operating Systems Principles – SOSP ’95, 201–212.
Lowell, D. E., &
Chen, P. M. (1997). Free transactions with Rio Vista. ACM SIGOPS Operating
Systems Review, 31(5), 92–101. https://doi.org/10.1145/269005.266665
Mellor-Crummey, J. M.,
& Scott, M. L. (1991). Algorithms for scalable synchronization on
shared-memory multiprocessors. ACM Transactions on Computer Systems, 9(1),
Squillante, M. S.,
Squillante, M. S., & Lazowska, E. D. (1993). Using Processor-Cache Affinity
Information in Shared-Memory Multiprocessor Scheduling. IEEE Transactions on
Parallel and Distributed Systems, 4(2), 131–143.
Thekkath, C. A., &
Levy, H. M. (1993). Limits to low-latency communication on high-speed networks.
ACM Transactions on Computer Systems, 11(2), 179–203.
Waldspurger, C. A.
(2002). Memory resource management in VMware ESX server. ACM SIGOPS
Operating Systems Review, 36(SI), 181.
Wollrath, A., Riggs,
R., & Waldo, J. (1996). A Distributed Object Model for the Java System. Conf
on ObjectOriented Technologies, 9(June), 265–290. Retrieved from http://www.usenix.org/publications/computing/9.4.html