This year Eurosys took place in Prague, Czech Republic. Eurosys is one of the main systems conferences. This year there were 28 accepted papers out of 143 (19% acceptance rate), a bit higher than previous years, but statistics did not consider submissions that did not fulfill the requirements, which has been counted in the past. Out of them only 6 european, and plenty of MSR papers.
Monday 15th April
Session 1: Large scale distributed computation I
Adverts must be reliable! more than monitoring (makes sense ).
Contribution focusses on resilience and fault tolerance. They build a DAG, rewritten dynamically, replacing link with hashed ones with several nodes. Not losing info, any subgraph can be substituted by an equivalent one, and reloads/recomputes missing pieces.
Several optimizations added, such as message batch aggregation, and lightweight dependency tracking from input/output, to estimate impact
Interesting work, and was well presented.
Qifa Ke, Michael Isard, and Yuan Yu (Microsoft Research Silicon Valley)
Motivation: there are many problems in large scale computations which cannot be known in advance. How to handle partition skew, what is the right number of tasks (e.g. Reducers?). Also, large scale matrix multiplication is argued it can cause problems with intermediate steps. Iterative co, or providing fault tolerance capabilities? The paper proposes to optimize the EPG (Execution Plan Graph) at runtime to solve these issues.
The solution for reliability is interesting,having a ‘cache’ for intermediate data, and choosing either data for next step.
[BEST PAPER AWARD]
Sameer Agarwal (University of California, Berkeley), Barzan Mozafari (Massachusetts Institute of Technology), Aurojit Panda (University of California, Berkeley), Henry Milner (University of California, Berkeley), Samuel Madden (Massachusetts Institute of Technology), and Ion Stoica (University of California, Berkeley)
Motivation: compute aggregate statistics on huge amounts of data.
Select WITHIN 2 seconds. Results with error, can be refined . The idea is very elegant and original, and was brilliantly presented.
Estimation time is obtained by querying over small samples and extrapolating (should be linear). So, how well does a query cover the original one? Depends on what elements it has. Computes how complete the data will be, don’t get it too well. A sample has cost and coverage. ILP problem determines what samples to get.
Session 2: Security and Privacy
David Schultz and Barbara Liskov (MIT CSAIL)
Information flow control, by tagging database rows with labels, stored in an extra column. It’s far from my area, but I am surprised this has not been done already. Paper is well explained, and results show that the incurred overhead is small. A problem pointed in the questions is that it requires manual tagging, which is in general extremely hard to do right.
Hayawardh Vijayakumar (The Pennsylvania State University), Joshua Schiffman (Advanced Micro Devices), and Trent Jaeger (The Pennsylvania State University)
There are many threats to file access, resource access cntrol is very hard. Programmer are not going to get it right. System call level protection carries a huge overhead. Idea: reverse threat protection approach. With introspection you protect vulnerable processes, instead of sand boxing dangerous attackers. So, declare unsafe resources for a specific process context.
It is interesting that processing firewall rules is much more efficient that implementing checks manually (because it is prone to errors). Declarative wins apparently by a huge margin.
Srinath Setty, Benjamin Braun, Victor Vu, and Andrew J. Blumberg (UT Austin), Bryan Parno (Microsoft Research Redmond), and Michael Walfish (UT Austin)
They propose a Cryptographic technique, Probabilistically Checkable Proof (PCP), for checking whether a server indeed performed some computation. It is not yet usable because of the huge computation cost.
Session 3: Replication
Sergio Almeida, Joao Leitao, and Luıs Rodrigues (INESC-ID, Instituto Superior Tecnico, Universidade Tecnica de Lisboa)
ChainReaction is a Geo-distributed K/V store that implements the existing Causal+ model for improved read performance over existing causal replication-based systems. The work extends over FAWN, and adds capabilities for a geo-distributed setup. It was well presented, and design decisions and results are clearly reflected in the paper.
Ricardo Padilha and Fernando Pedone (University of Lugano, Switzerland)
Bizantine Failure Tolerance (BFT) would be convenient in a cloud environment, but it heavily penalizes latency. The paper proposes single-partition transactions, and multi-partition read only transactions. The restrictions in applicability look a bit severe, but it was nicely validated across different workloads. Not sure that the social network workload they generated was representative, with them choosing an arbitrary 50% chance of a connection being close, with no justification.
Tim Kraska, Gene Pang, and Michael Franklin (UC Berkeley), Samuel Madden (MIT), and Alan Fekete (University of Sydney)
The authors present MDCC, a replication technique that attempts to exploit two main observations of geo distributed DBs: conflicting operations are commutative, and they are actually rare, as each client often updates their own data. With these, they implement a modified version of Paxos Multi + Fast, which attempts to lessen latency by reducing in several cases the number of phases. Results point in an extensive set of experiments to a significant performance improvement over other transactional databases.
Session 4: Concurrency and Parallelism
Conversion: Multi-Version Concurrency Control for Main Memory Segments
Timothy Merrifield and Jakob Eriksson (University of Illinois at Chicago)
Cache control has become a main bottleneck in multi-core systems. Proposal: each process handles its own working copy for concurrent memory access. If processes can afford working with a slightly out of date copy, performance can be significantly improved.
Whose Cache Line Is It Anyway? Operating System Support for Live Detection and Repair of False Sharing
Mihir Nanavati, Mark Spear, Nathan Taylor, Shriram Rajagopalan, Dutch T. Meyer, William Aiello, and Andrew Warfield (University of British Columbia)
Writes to the same cache line from multiple processes force to write everyone to main memory. Can have a huge impact on performance in many cases. Idea: split pages in an isolated page where conflicts are, and an underlay page with no conflicts.
Adaptive Parallelism for Web Search
Myeongjae Jeon (Rice University), Yuxiong He (Microsoft Research), Sameh Elnikety (Microsoft Research), Alan L. Cox and Scott Rixner (Rice University)
In web search services (e.g. Bing), parallelism involves querying multiple index servers for results, and aggregating them with techniques such as PageRank. However, index server queries are sequential. The paper discusses the challenges of parallelizing in-server search. I don’t think it is novel, but it is well explained.
Tuesday 16th April
Session 1: Large scale distributed computation II
Zuhair Khayyat, Karim Awara, and Amani Alonazi (King Abdullah University of Science and Technology), Hani Jamjoom and Dan Williams (IBM T. J. Watson Research Center, Yorktown Heights), and Panos Kalnis (King Abdullah University of Science and Technology)
MIzan is a Pregel-based system, implemented in C++, which optimizes execution time by dynamically migrating vertices every iteration. Each node superstep execution is profiled, so that they statistically see what nodes perform slower, and if over threshold migrate dynamically. Every worker has a match worker with less load, where all migrations are headed too. Node locations are handled through a DHT. The technique is interesting in the sense that it completely ignores graph topology. It might incidentally reduce the number of cut edges, but does it by only looking at runtime statistics. However, the destination is chosen to load balance CPU, so if the network is the bottleneck it might not be enough.
The solution works, although validation has some problems, they only looked at graphs in the 2M range, which lend to questions about why not do it in a single machine. Comparisons where against a specific version of Giraph, which Greg Malewicz pointed out it had been vastly improved in the last six months.
Francisco Cruz, Francisco Maia, Miguel Matos, Rui Oliveira, Joao Paulo, Jose Pereira, and Ricardo Vilaca (HASLab / INESC TEC and U. Minho)
MeT is an HBase extension that performs elastic configuration of slaves. Depending on the observed access patterns, it provides replication and load balancing, Monitoring runtime statistics feeds a decision algorithm, identifying suboptimal config from CPU Usage. If a problem is detected, a distribution algorithm is run to spread the replicas.
Shivaram Venkataraman (UC Berkeley), Erik Bodzsar (University of Chicago), and Indrajit Roy, Alvin AuYoung, and Robert S. Schreiber (HP Labs)
Presto is an R library that parallelizes matrix computations using distributed machines (darray construct, supporting for each). The API is lightweight, and requires minimal modification of R code. When paralellizing sparse matrices, Presto attempts to avoid the impact of skew in sparse matrices partitions.
Online repartitioning scheme, profiling each partition for optimizing further iteratons, if ratio higher than threshold, split problematic partition. It is integrated to R without modifying it by hacking memory allocation including object headers. The contribution is not major, but is is well thought and described.
Session 2: Operating Systems Implementation
Austin T. Clements, Frans Kaashoek, and Nickolai Zeldovich (MIT CSAIL)
Stan Park (University of Rochester), Terence Kelly (HP Labs), and Kai Shen (University of Rochester)
Andrew Baumann (Microsoft Research), Dongyoon Lee (University of Michigan), Pedro Fonseca (MPI Software Systems), and Jacob R. Lorch, Barry Bond, Reuben Olinsky, and Galen C. Hunt (Microsoft Research)
Session 3: Miscellaneous
Abhilash Jindal, Abhinav Pathak, Y. Charlie Hu, and Samuel Midkiff (Purdue University)
They analyze several sleep conflicts when the state machine of the smartphone fails, and the device is not effectively suspended. Tested on Nexus One and Galaxy S devices (3+ year old). Looks a bit weak.
Prashanth Mohan (UC Berkeley) and Suman Nath and Oriana Riva (Microsoft Research)
It is MS data of course, but… just go to Breaking for Commercials: Characterizing Mobile Advertising
Liang Zhang, Fangfei Zhou, Alan Mislove, and Ravi Sundaram (Northeastern University)
Maygh is a web-based CDN implemented with HTML 5. Content is cached through HTML 5 LocalStorage, 5 MB with programmatic control. . Novelty, no client modification at all (browser, not plugins such as firecoral). Implemented with RTMFP (Flash), WebRTC, key is NAT traversal via STUN.
Architecture is based on a proxy, the Maigh coordinator, maintaning a directory for content, via hashing. The idea works in principle, it has not been developed beyond a proof of concept (scalabiilty, security are not addressed properly). Interesting read.
Wednesday 17th April
Session 1: Virtualization
Jean-Pascal Billaud and Ajay Gulati (VMware, Inc.)
Gernot Heiser, Etienne Le Sueur, Adrian Danis, and Aleksander Budzynowski (NICTA and UNSW) and Tudor-Ioan Salomie and Gustavo Alonso (ETH Zurich)
Tudor-Ioan Salomie, Gustavo Alonso, and Timothy Roscoe (ETH Zurich) and Kevin Elphinstone (UNSW and NICTA)
Currently many applications (e.g. databases) are not designed to work fairly in a virtualized environment, where resources such as memory are shared and dynamically assigned. instead, they grab hold of the resources, whereas in many cases they could be working with much less memory and perform similarly.
The paper proposes a technique for ‘ballooning’ applications, so that the amount of memory assigned to them can be expanded, or squeezed. It requires modification of the applications and was developed for MySQL and the OpenJDK JVM, over Xen. Very interesting paper.
Session 2: Scheduling and performance isolation
[BEST STUDENT PAPER AWARD]
Malte Schwarzkopf (University of Cambridge Computer Laboratory), Andy Konwinski (University of California Berkeley), and Michael Abd-el-Malek and John Wilkes (Google Inc.)
Omega is the upcoming scheduler for Google datacenters. It performs heterogeneous scheduling, segregating types of jobs, batch and services, with priority for batch (as they are orders of magnitude more). Solution: multiple schedulers, with shared state, and optimistic concurrency . They had to add some optimizations because constraints cause an enormous number of conflicts up when simulating realistic scenarios. The approach allows having custom schedulers per application type, and they show an example that improves MR scheduling playing around the pattern in user preferences. Very good paper, well-deserved award.
Ali Ghodsi, Matei Zaharia, Scott Shenker, and Ion Stoica (UC Berkeley)
Same problem, resource allocation in multitenant datacenters. They apply a constrained max min algorithm to guarantee fairness. The algo recursively maximizes allocation for the user with fewest machines. Nice flow filling model. Offline optimum, online approximation. Very interesting contrast with the previous paper, one very practical, based on the real Google workload, and this one much more academic in the resource allocation problem.
Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, vrigo Gokhale, and John Wilkes (Google, Inc.)
Performance isolation does not have perfectly in practice because of all the contending resources such as cache memory. They have implemented detection of problematic processes by monitoring hw performance counters, and ideally throttle the culprits.