<?xml version="1.0" encoding="utf-8"?> 
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
 <title type="text">SHRIPHANI PALAKODETY: Posts tagged 'google-research'</title>
 <link rel="self" href="http://blog.shriphani.com/feeds/google-research.atom.xml" />
 <link href="http://blog.shriphani.com/tags/google-research.html" />
 <id>urn:http-blog-shriphani-com:-tags-google-research-html</id>
 <updated>2013-07-24T22:54:28Z</updated>
 <entry>
  <title type="text">The Percolator Paper</title>
  <link rel="alternate" href="http://blog.shriphani.com/2013/07/24/the-percolator-paper/?utm_source=google-research&amp;utm_medium=Atom" />
  <id>urn:http-blog-shriphani-com:-2013-07-24-the-percolator-paper</id>
  <published>2013-07-24T22:54:28Z</published>
  <updated>2013-07-24T22:54:28Z</updated>
  <author>
   <name>SHRIPHANI PALAKODETY</name></author>
  <content type="html">&lt;html&gt;
&lt;p&gt;In the IR reading group this week I decided to read the Percolator paper from Google[1]. It caused quite a stir on several news-reading sites after a Google Research blog-post on the topic. Since I&amp;rsquo;ve never had the chance to read it, this is as good a time as any. &lt;strong&gt;This is not a comprehensive summary at all and lots of results here are hand-wavy. If you want to instruct yourself, please read the paper.&lt;/strong&gt;&lt;/p&gt;
&lt;!-- more--&gt;

&lt;h2 id="setting"&gt;Setting&lt;/h2&gt;

&lt;p&gt;Index updates involve making several small updates to a large data store. Map-Reduce and batch-processing systems aim for ammortized efficiency and index updates do not lend themselves well to this category.&lt;/p&gt;

&lt;h3 id="index-construction"&gt;Index Construction:&lt;/h3&gt;

&lt;p&gt;Index construction can be summarized as accessing every page on the web and processing these pages while keeping track of invariants.&lt;/p&gt;

&lt;p&gt;If we were to structure this as a collection of MapReduce tasks, in a sequence of jobs, we identify duplicates, invert links and finally come to the task of building the index itself (this is the task of index construction - I will not deal with it here).&lt;/p&gt;

&lt;p&gt;If we would like to process a small batch of documents and then update our index, we can see that the task of link-inversion would require a batch job over the entire newer repository. As such this system doesn&amp;rsquo;t lend itself to processing jobs in small batches. Percolator exists to solve this problem.&lt;/p&gt;

&lt;h2 id="percolator"&gt;Percolator&lt;/h2&gt;

&lt;p&gt;Percolator provides us:&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;random access to the data repository&lt;/li&gt;
 &lt;li&gt;strong consistency guarantees&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;Its components are:&lt;/p&gt;

&lt;h3 id="bigtable"&gt;BigTable:&lt;/h3&gt;

&lt;p&gt;This is used to store structured data designed to scale. It is not a fully relational database but allows control over the underlying data model.&lt;/p&gt;

&lt;p&gt;The data is indexed using row and column names (this is up to the application). The cells store strings (these are not interpreted so you can marshall objects here).&lt;/p&gt;

&lt;p&gt;An example of an object in BigTable is:&lt;/p&gt;

&lt;script src="https://gist.github.com/shriphani/6073308.js"&gt;&lt;/script&gt;

&lt;p&gt;So, the row-key there was the domain (written out in the inverse format the RFC allows). The column-keys shown are the contents and an anchor (the keys are arbitrary and up to us).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rows:&lt;/strong&gt; Data is stored in sorted order (lexicographic) of &lt;strong&gt;row-keys&lt;/strong&gt;. Tablets are row-ranges and these are distributed across machines. Locality is to be achieved by keeping closely associated items within a tablet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Columns:&lt;/strong&gt; Column keys are grouped into column families. The same type of data is stored in a family. Access control happens here.&lt;/p&gt;

&lt;p&gt;This also allows applications of various kinds (read-intensive v. write-intensive or privacy-preserving).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timestamps:&lt;/strong&gt; Versioning of data is achieved using timestamps. For example, we store the CNN homepage at different timestamps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transactions:&lt;/strong&gt; ACID guarantees are provided. &lt;code&gt;commit()&lt;/code&gt; and &lt;code&gt;get()&lt;/code&gt; are blocking. Thread pools are used for parallel access. Reads don&amp;rsquo;t require locking.&lt;/p&gt;

&lt;p&gt;Locking is implemented in Percolator itself as opposed to BigTable. This locking mechanism is stored in BigTable as well (particularly in in-memory columns).&lt;/p&gt;

&lt;p&gt;The timestamp oracle (my guess is that this is like zookeeper) determines the start time of a transaction and thus determines what snapshot a &lt;code&gt;get()&lt;/code&gt; request sees. &lt;code&gt;set()&lt;/code&gt; requests are buffered until a commit.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;commit()&lt;/code&gt; procedure is:&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;Lock all the cells being written.&lt;/li&gt;
 &lt;li&gt;&lt;code&gt;write-write&lt;/code&gt; locks occur when a new write lock is obtained after the current write transaction has started (and is ongoing). The snapshot isolation prevents this.&lt;/li&gt;
 &lt;li&gt;Any other lock seen results in an abort of the &lt;code&gt;commit&lt;/code&gt;.&lt;/li&gt;
 &lt;li&gt;If there is no conflict, write the lock and the data to the corresponding cell.&lt;/li&gt;
 &lt;li&gt;Then obtain a commit timestamp and make the write visible to readers by replacing the lock with a write record.&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;In the event of client failure, locks are left behind and need to be cleaned up. A primary lock is used and if a crash has occurred the primary lock can be used to determine a crash of a previous transaction and thus the locks can be discarded.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timestamp Oracle:&lt;/strong&gt; This is a server that hands out timestamps (so zookeeper? or whatever Google uses - It&amp;rsquo;s called Chubby I think).&lt;/p&gt;

&lt;h3 id="triggers"&gt;Triggers&lt;/h3&gt;

&lt;p&gt;Percolator has a mechanism for triggering and running transactions. A set of observer binaries register a thunk with columns that get called upon an update. Percolator applications are structured like these.&lt;/p&gt;

&lt;h2 id="performance"&gt;Performance&lt;/h2&gt;

&lt;p&gt;MapReduce involves one bulk read from the GFS which Percolator performs 50 operations per document resulting in a lot of RPCs.&lt;/p&gt;

&lt;p&gt;Commits also require RPCs and to reduce this count transactions involving one column are batched together and performed in 1 RPC.&lt;/p&gt;

&lt;p&gt;The same batching is used to serve &lt;code&gt;read()&lt;/code&gt; requests as well. Pre-fetching is another optimization that is aimed at making use of locality (in terms of the columns involved) in requests.&lt;/p&gt;

&lt;p&gt;A newly crawled document enters the searchable index faster using Percolator than when using MapReduce.&lt;/p&gt;

&lt;p&gt;Since the document enters the index quicker:&lt;/p&gt;

&lt;ul&gt;
 &lt;li&gt;The index can grow larger (no need to process the index as-is every single time).&lt;/li&gt;
 &lt;li&gt;The corpus is fresher (documents don&amp;rsquo;t spend days in the sequence of MapReduce jobs).&lt;/li&gt;
 &lt;li&gt;There is an increase in resources used but the gains are greater than they would have been using MapReduce.&lt;/li&gt;&lt;/ul&gt;

&lt;p&gt;At extremely high crawl rates, it is more efficient to use MapReduce but real systems exhibit crawl rates that make Percolator a better choice than MapReduce.&lt;/p&gt;

&lt;p&gt;Percolator is an implementation on top of BigTable so it is marginally slower than BigTable operations.&lt;/p&gt;

&lt;p&gt;The paper also provided results using the TPC-E benchmark [2] - Percolator is &lt;strong&gt;3x&lt;/strong&gt; better than the leader of the TPC-E board (although I am not sure if these numbers mean anything). This comes at the cost of a 30-fold overhead and this is a (potential) area of improvement.&lt;/p&gt;

&lt;p&gt;[1] &lt;a href="http://research.google.com/pubs/pub36726.html"&gt;http://research.google.com/pubs/pub36726.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;[2] &lt;a href="http://www.tpc.org/default.asp"&gt;http://www.tpc.org/default.asp&lt;/a&gt;&lt;/p&gt;&lt;/html&gt;</content></entry></feed>