DataStax Enterprise has three mechanisms for making sure that your data stays in sync: hinted handoff, read repair, and anti-entropy repair.  Of these, read repair is probably the most misunderstood.  What follows is an approximate description.  The full behavior is quite complicated and depends on timing and the interactions of various mechanisms and is beyond the scope of this article.

There are broadly two types of read repair: foreground and background.  Foreground here means blocking -- we complete all operations before returning to the client.  Background means non-blocking -- we begin the background repair operation and then return to the client before it has completed.  

Foreground read repair is always performed on queries which use a consistency level greater than ONE/LOCAL_ONE. The coordinator asks one replica for data and the others for digests of their data (currently MD5).  If there's a mismatch in the data returned to the coordinator from the replicas, we resolve the situation by doing a data read from all replicas involved in the query (the number is dictated by the consistency level) and then merging the results.   It may be that a single replica has all of the latest data for each column, or that we have to assemble a new record by mixing and matching columns from different replicas.  After we've determined/created the latest version of the record, we write it back to only the replicas involved in the request.  For the common case of a LOCAL_QUORUM read with RF=3, we would have originally queried two replicas, so those will be the only two that we repair.  A query with a consistency level of ONE or LOCAL_ONE can't cause a blocking read repair because we must have a data mismatch to trigger the read repair and if we only queried one replica, there's no comparison, so no mismatch.

The two read repairs that are governed by "chance" parameters (per-table parameters for cluster-wide vs dc-local, read_repair_chance and dc_local_read_repair_chance) are both background repairs.  Background read repair happens as follows: If the read repair chance properties are not zero on a table, then during each query we ask a random number generator for a number between 0.0 and 1.0.  We first see if the random number is less than or equal to read_repair_chance.  If it is, we perform a non-blocking global read repair.  If it is not, we test to see if the random number is less than or equal to dc_local_read_repair_chance.  If it is, we perform a non-blocking read repair in the local DC only.  Notice that we use a single random value for both of these chance read repair tests and that global read repair is evaluated first.  This means that if read_repair_chance >= dc_local_read_repair_chance for a given table, you will never have DC-local background read repair for that table.

As I've stated previously, a full characterization of read repair is quite complicated.  This is due to the interaction of blocking read repair, non-blocking read repair, speculative retry, and the timing of the responses of data reads and digest reads.  To give some idea of the complexity here, you can take a look at CASSANDRA-9753 ( ).  The gist is that If you have dc_local_read_repair_chance set AND you have speculative retry enabled, you can hit CASSANDRA-9753.  This is a bug where a read request gets sent to another DC (possibly halfway around the world) and the read will block.  If tracing requests shows that you're hitting this, I recommend turning off both of the chance read repair parameters on that table, as they are less useful than Speculative Retry.

Two commands can help you look at read repair statistics.  nodetool tpstats has some information on read repair in the ReadRepairStage row, but nodetool netstats is a better place to get information for this.  This will show you stats on number of attempted read repairs and whether they were blocking or background.

For tables using either DTCS or TWCS, set both chance read repair parameters on those tables to 0.0 as read repair could inject out-of-sequence timestamps.