Report on ACM Eurosys 2013 Conference

Technical University Library

This year Eurosys took place in Prague, Czech Republic. Eurosys is one of the main systems conferences. This year there were 28 accepted papers out of 143 (19% acceptance rate), a bit higher than previous years, but statistics did not consider submissions that did not fulfill the requirements, which has been counted in the past. Out of them only 6 european, and plenty of MSR papers.

Monday 15th April

Session 1: Large scale distributed computation I

TimeStream: Reliable Stream Computation in the Cloud

Chengping Qian (Microsoft Research Asia), Yong He (South China University of Technology), Chunzhi Su, Zhuojie Wu, and Hongyu Zhu (Shanghai Jiaotong University), Taizhi Zhang (Peking University), Lidong Zhou (Microsoft Research Asia), Yuan Yu (Microsoft Research Silicon Valley), and Zheng Zhang (Microsoft Research Asia)
Good motivation from MSR about why use stream processing: real-time heat map of latency pairwise in datacenter for network/ Infrastructure monitoring, real-time advertising, map queries, both current and previous ones, to the presented adverts.

Adverts must be reliable! more than monitoring (makes sense :) ).

Contribution focusses on resilience and fault tolerance. They build a DAG, rewritten dynamically, replacing link with hashed ones with several nodes. Not losing info, any subgraph can be substituted by an equivalent one, and reloads/recomputes missing pieces.

Several optimizations added, such as message batch aggregation, and lightweight dependency tracking from input/output, to estimate impact

Interesting work, and was well presented.

Optimus: A Dynamic Rewriting Framework for Execution Plans of Data-Parallel Computation

Qifa Ke, Michael Isard, and Yuan Yu (Microsoft Research Silicon Valley)

Motivation: there are many problems in large scale computations which cannot be known in advance. How to handle partition skew, what is the right number of tasks (e.g. Reducers?). Also, large scale matrix multiplication is argued it can cause problems with intermediate steps. Iterative co, or providing fault tolerance capabilities? The paper proposes to optimize the EPG (Execution Plan Graph) at runtime to solve these issues.

The solution for reliability is interesting,having a ‘cache’ for intermediate data, and choosing either data for next step.

BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data


Sameer Agarwal (University of California, Berkeley), Barzan Mozafari (Massachusetts Institute of Technology), Aurojit Panda (University of California, Berkeley), Henry Milner (University of California, Berkeley), Samuel Madden (Massachusetts Institute of Technology), and Ion Stoica (University of California, Berkeley)

Motivation: compute aggregate statistics on huge amounts of data.

Select WITHIN 2 seconds. Results with error, can be refined . The idea is very elegant and original, and was brilliantly presented.

Estimation time is obtained by querying over small samples and extrapolating (should be linear). So, how well does a query cover the original one? Depends on what elements it has. Computes how complete the data will be, don’t get it too well. A sample has cost and coverage. ILP problem determines what samples to get.

Session 2: Security and Privacy

IFDB: Decentralized Information Flow Control for Databases

David Schultz and Barbara Liskov (MIT CSAIL)

Information flow control, by tagging database rows with labels, stored in an extra column. It’s far from my area, but I am surprised this has not been done already. Paper is well explained, and results show that the incurred overhead is small. A problem pointed in the questions is that it requires manual tagging, which is in general extremely hard to do right.

Process Firewalls: Protecting Processes During Resource Access

 Hayawardh Vijayakumar (The Pennsylvania State University), Joshua Schiffman (Advanced Micro Devices), and Trent Jaeger (The Pennsylvania State University)

There are many threats to file access, resource access cntrol is very hard. Programmer are not going to get it right. System call level protection carries a huge overhead.  Idea: reverse threat protection approach. With introspection you protect vulnerable processes, instead of sand boxing dangerous attackers. So, declare unsafe resources for a specific process context.

It is interesting that processing firewall rules is much more efficient that implementing checks manually (because it is prone to errors). Declarative wins apparently by a huge margin.

Resolving the conflict between generality and plausibility in verified computation

Srinath Setty, Benjamin Braun, Victor Vu, and Andrew J. Blumberg (UT Austin), Bryan Parno (Microsoft Research Redmond), and Michael Walfish (UT Austin)

They propose a Cryptographic technique, Probabilistically Checkable Proof (PCP), for checking whether a server indeed performed some computation. It is not yet usable because of the huge computation cost.

Session 3: Replication

ChainReaction: A Causal+ Consistent Datastore based on Chain Replication

Sergio Almeida, Joao Leitao, and Luıs Rodrigues (INESC-ID, Instituto Superior Tecnico, Universidade Tecnica de Lisboa)

ChainReaction is a Geo-distributed K/V store that implements the existing Causal+ model for improved read performance over existing causal replication-based systems. The work extends over FAWN, and adds capabilities for a geo-distributed setup. It was well presented, and design decisions and results are clearly reflected in the paper.

Augustus: Scalable and Robust Storage for Cloud Applications

Ricardo Padilha and Fernando Pedone (University of Lugano, Switzerland)

Bizantine Failure Tolerance (BFT) would be convenient in a cloud environment, but it heavily penalizes latency. The paper proposes single-partition transactions, and multi-partition read only transactions. The restrictions in applicability look a bit severe, but it was nicely validated across different workloads. Not sure that the social network workload they generated was representative, with them choosing an arbitrary 50% chance of a connection being close, with no justification.

MDCC: Multi-Data Center Consistency

Tim Kraska, Gene Pang, and Michael Franklin (UC Berkeley), Samuel Madden (MIT), and Alan Fekete (University of Sydney)

The authors present MDCC, a replication technique that attempts to exploit two main observations of geo distributed DBs: conflicting operations are commutative, and they are actually rare, as each client often updates their own data. With these, they implement a modified version of Paxos Multi + Fast, which attempts to lessen latency by reducing in several cases the number of phases. Results point in an extensive set of experiments to a significant performance improvement over other transactional databases.

Session 4: Concurrency and Parallelism

Conversion: Multi-Version Concurrency Control for Main Memory Segments 
Timothy Merrifield and Jakob Eriksson (University of Illinois at Chicago)

Cache control has become a main bottleneck in multi-core systems. Proposal: each process handles its own working copy for concurrent memory access. If processes can afford working with a slightly out of date copy, performance can be significantly improved.

Whose Cache Line Is It Anyway? Operating System Support for Live Detection and Repair of False Sharing 
Mihir Nanavati, Mark Spear, Nathan Taylor, Shriram Rajagopalan, Dutch T. Meyer, William Aiello, and Andrew Warfield (University of British Columbia)

Writes to the same cache line from multiple processes force to write everyone to main memory. Can have a huge impact on performance in many cases. Idea: split pages in an isolated page where conflicts are, and an underlay page with no conflicts.

Adaptive Parallelism for Web Search 
Myeongjae Jeon (Rice University), Yuxiong He (Microsoft Research), Sameh Elnikety (Microsoft Research), Alan L. Cox and Scott Rixner (Rice University)

In web search services (e.g. Bing), parallelism involves querying multiple index servers for results, and aggregating them with techniques such as PageRank. However, index server queries are sequential. The paper discusses the challenges of parallelizing in-server search. I don’t think it is novel, but it is well explained.

Tuesday 16th April

Session 1: Large scale distributed computation II

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing

Zuhair Khayyat, Karim Awara, and Amani Alonazi (King Abdullah University of Science and Technology), Hani Jamjoom and Dan Williams (IBM T. J. Watson Research Center, Yorktown Heights), and Panos Kalnis (King Abdullah University of Science and Technology)

MIzan is a Pregel-based system, implemented in C++, which optimizes execution time by dynamically migrating vertices every iteration. Each node superstep execution is profiled, so that they statistically see what nodes perform slower, and if over threshold migrate dynamically. Every worker has a match worker with less load, where all migrations are headed too. Node locations are handled through a DHT. The technique is interesting in the sense that it completely ignores graph topology. It might incidentally reduce the number of cut edges, but does it by only looking at runtime statistics. However, the destination is chosen to load balance CPU, so if the network is the bottleneck it might not be enough.

The solution works, although validation has some problems, they only looked at graphs in the 2M range, which lend to questions about why not do it in a single machine. Comparisons where against a specific version of Giraph, which Greg Malewicz pointed out it had been vastly improved in the last six months.

MeT: Workload aware elasticity for NoSQL

Francisco Cruz, Francisco Maia, Miguel Matos, Rui Oliveira, Joao Paulo, Jose Pereira, and Ricardo Vilaca (HASLab / INESC TEC and U. Minho)

MeT is an HBase extension that performs elastic configuration of slaves. Depending on the observed access patterns, it provides replication and load balancing, Monitoring runtime statistics feeds a decision algorithm, identifying suboptimal config from CPU Usage. If a problem is detected, a distribution algorithm is run to spread the replicas.

Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices

Shivaram Venkataraman (UC Berkeley), Erik Bodzsar (University of Chicago), and Indrajit Roy, Alvin AuYoung, and Robert S. Schreiber (HP Labs)

Presto is an R library that parallelizes matrix computations using distributed machines (darray construct, supporting for each). The API is lightweight, and requires minimal modification of R code. When paralellizing sparse matrices, Presto attempts to avoid the impact of skew in sparse matrices partitions.

Online repartitioning scheme, profiling each partition for optimizing further iteratons, if ratio higher than threshold, split problematic partition. It is integrated to R without modifying it by  hacking memory allocation including object headers. The contribution is not major, but is is well thought and described.

Session 2: Operating Systems Implementation

RadixVM: Scalable address spaces for multithreaded applications

Austin T. Clements, Frans Kaashoek, and Nickolai Zeldovich (MIT CSAIL)

Failure-Atomic msync(): A Simple and Efficient Mechanism for Preserving the Integrity of Durable Data

Stan Park (University of Rochester), Terence Kelly (HP Labs), and Kai Shen (University of Rochester)

Composing OS extensions safely and efficiently with Bascule

Andrew Baumann (Microsoft Research), Dongyoon Lee (University of Michigan), Pedro Fonseca (MPI Software Systems), and Jacob R. Lorch, Barry Bond, Reuben Olinsky, and Galen C. Hunt (Microsoft Research)

Session 3: Miscellaneous

Hypnos: Understanding and Treating Sleep Conflicts in Smartphone

Abhilash Jindal, Abhinav Pathak, Y. Charlie Hu, and Samuel Midkiff (Purdue University)

They analyze several sleep conflicts when the state machine of the smartphone fails, and the device is not effectively suspended. Tested on Nexus One and Galaxy S devices (3+ year old). Looks a bit weak.

Prefetching Mobile Ads: Can advertising systems afford it?

Prashanth Mohan (UC Berkeley) and Suman Nath and Oriana Riva (Microsoft Research)

It is MS data of course, but… just go to Breaking for Commercials: Characterizing Mobile Advertising

Maygh: Building a CDN from client web browsers

Liang Zhang, Fangfei Zhou, Alan Mislove, and Ravi Sundaram (Northeastern University)

Maygh is a web-based CDN implemented with HTML 5.  Content is cached through HTML 5 LocalStorage, 5 MB with programmatic control. . Novelty, no client modification at all (browser, not plugins such as firecoral). Implemented with RTMFP (Flash), WebRTC, key is NAT traversal via STUN.

Architecture is based on a proxy, the Maigh coordinator, maintaning a directory for content, via hashing. The idea works in principle, it has not been developed beyond a proof of concept (scalabiilty, security are not addressed properly). Interesting read.

Wednesday 17th April

Session 1: Virtualization

hClock: Hierarchical QoS for Packet Scheduling in a Hypervisor

Jean-Pascal Billaud and Ajay Gulati (VMware, Inc.)

RapiLog: Reducing System Complexity Through Verification

Gernot Heiser, Etienne Le Sueur, Adrian Danis, and Aleksander Budzynowski (NICTA and UNSW) and Tudor-Ioan Salomie and Gustavo Alonso (ETH Zurich)

Application Level Ballooning for Efficient Server Consolidation

Tudor-Ioan Salomie, Gustavo Alonso, and Timothy Roscoe (ETH Zurich) and Kevin Elphinstone (UNSW and NICTA)

Currently many applications (e.g. databases) are not designed to work fairly in a virtualized environment, where resources such as memory are shared and dynamically assigned. instead, they grab hold of the resources, whereas in many cases they could be working with much less memory and perform similarly.

The paper proposes a technique for ‘ballooning’ applications, so that the amount of memory assigned to them can be expanded, or squeezed. It requires modification of the applications and was developed for MySQL and the OpenJDK JVM, over Xen. Very interesting paper.

Session 2: Scheduling and performance isolation

Omega: flexible, scalable schedulers for large compute clusters


Malte Schwarzkopf (University of Cambridge Computer Laboratory), Andy Konwinski (University of California Berkeley), and Michael Abd-el-Malek and John Wilkes (Google Inc.)

Omega is the upcoming scheduler for Google datacenters. It performs heterogeneous scheduling, segregating types of jobs, batch and services, with priority for batch (as they are orders of magnitude more). Solution: multiple schedulers, with shared state, and optimistic concurrency . They had to add some optimizations because constraints cause an enormous number of conflicts up when simulating realistic scenarios. The approach allows having  custom schedulers per application type, and they show an example that improves MR scheduling playing around the pattern in user preferences. Very good paper, well-deserved award.

Choosy: Max-Min Fair Sharing for Datacenter Jobs with Constraints

Ali Ghodsi, Matei Zaharia, Scott Shenker, and Ion Stoica (UC Berkeley)

Same problem, resource allocation in multitenant datacenters. They apply a constrained max min algorithm to guarantee fairness. The algo recursively maximizes allocation for the user with fewest machines. Nice flow filling model. Offline optimum, online approximation. Very interesting contrast with the previous paper, one very practical, based on the real Google workload, and this one much more academic in the resource allocation problem.

CPI2: CPU performance isolation for shared compute clusters

Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, vrigo Gokhale, and John Wilkes (Google, Inc.)

Performance isolation does not have perfectly in practice because of all the contending resources such as cache memory. They have implemented detection of problematic processes by monitoring hw performance counters, and ideally throttle the culprits.

Report from ACM EuroSys MPM2012 workshop

Measurement, Privacy, and Mobility (MPM 2012)

Keynote from Steve Uhlig on content delivery platforms, agile network measurement, and understanding the can ecosystem, the adaptation to change in demand is slow today, so it will be better to use virtualisation technologies to manage the demand shifts. There is growing infrastructure and storage diversity which allows for universal content delivery, so virtualisation can enable mobility and agile services.
dswiss. Secure safe
Attackers have a variety of methods for accessing the data, password solutions are not enough. It is possible to scan the whole ipv4 address space in a day. Trust on cloud providers is based on social prestige. They use secure remote password (srp) in order to avoid MITM attacks on passwords even on insecure channels. In addition to that, a number of key chains and symmetric and asymmetric keys are used to enable document sharing, however if the user forgets their password AND their recovery code, the data is deleted. Encourages providers to prevent employee access to data.

David Evans, malfunction analysis and privacy attacks
Sensors in buildings have privacy implications since they are not protected. Classifying data using tags, enable reflection of physical environment and do reasoning on privacy implications of sensing in the physical world. Tags can be based on sensor, location and time. This allows for analysis of sensitivity of data in different context and using different data sources in conjunction with one another.

Miguel Nunez, Markova based location mobile prediction
Predicting trajectories is important for services such as content delivery, tourist information, weather reports etc. this has been done using raw trajectories or clustering of trajectories using semantic mapping. Using Markova models allows probabilistic prediction of sequence of states, using density joint clusters. They used microsoft geolife data set and their own data set to train and test the n-MMC which gives them about 70-80% accuracy especially for higher number of user POIs.
ANOSIP: Anonymizing the SIP Protocol
Iraklis Leontiadis (Institute Eurecom)
SIP Used often for phone conferencing, with text based call flow messages. The aim of the work is to protect the ID of user from the call portals or man in the middle attacks. Use a number of techniques to achieve this.

Online Privacy: From Users to Markets to Deployment
Dr Vijay Erramilli (Telefónica I+D Research, Spain)
Economic model of web: free service for personal data, so advertising and economy is the main driver. They want to understand monetization aspect. Check paper on arxiv. They carried out questionnaire using browser plugin to ask users about value of their actions. Highly revisited data and sites yield high gains. Conducting economics and marketing solutions to understand the ecosystem more.

Confidential Carbon Commuting
Chris Elsmore, Anil Madhavapeddy, Ian Leslie, and Amir
Understanding employee commute is important, however it is hard to collect the data. University used an app to collect user data. Personal container is used for data aggregation. It allows sensitive questions to be asked about employee habits. Check
The Impact of Trace and Adversary Models on Location Privacy Provided by K-anonymity
Volkan Cambazoglu and Christian Rohner (Uppsala University)
Used trace generation on different walk models for simulating locations. Used k-anonymity for identity protection and obfuscation for time of event hiding.

An Empirical Study on IMDb and its Communities Based on the Network of Co-Reviewers , Maryam Fatemi and Laurissa Tokarchuk
Interaction between people and content on social networks is important. There are a number of recommendation systems available but they suffer from shortcomings. A number of methods are used for comparison of movie review communities on imdb. So must take into account genres and context.

Providing Secure and Accountable Privacy to Roaming 802.11 Mobile Devices , Panagiotis Georgopoulos, Ben McCarthy, and Christopher Edwards
Mobile devices require connectivity and security. Differences in protocols and accesses point configs effect user mobility. An eduroam equivalent can work. Use CUI RFC4372. The idea is that request is anonymous access network, but relays alias to home network for authentication. Real ipv6 deployment test is done in lancaster.

When Browsing Leaves Footprints – Automatically Detect Privacy Violations
Hans Hofinger, Alexander Kiening, and Peter Schoo (Fraunhofer Research Institution for Applied and Integrated Security AISEC)
Introduced prividor, privacy violation detector via browser add-on. There are a large number of web techniques for user tracking such as cookies and scripts. A database is used for keeping track of bad sites, in addition to code checking. A centralised version is chosen for better management.