CODEZUKI

A blog about coding.

A Unique Counter With Cassandra

A common requirement is to record a unique count of the occurences of some event. Obviously to do this you have to know if it has happened before, but using the standard “set if not exists” semantic in an eventually consistent environment like Cassandra can be challenging.

Cassandra does support “INSERT … IF NOT EXISTS”, however this is intended for multiple datacenter scenarios and special cases, but it’s not performant in a cluster under high load because an agreement has to be reached between multiple nodes to decide whether or not the insert can succeed. This slows down writes enormously.

Using IF NOT EXISTS incurs a performance hit associated with using Paxos internally

– DataStax INSERT reference

Last write wins

The Cassandra read and write path is already well documented, so I’ll be brief.

When Cassandra writes data it performs no seek, only appending writes. This means that if there is more than one write to the same primary key the latest timestamp is used to decide the winning value. The last write wins. Most of the time this is exactly the behaviour you want, but not for our unique counter.

First write wins

Lets invent a somewhat contrived example. We want to count the number of users who have accepted a meeting invitation.

The table might look something like this CQL definition.

1
2
3
4
CREATE TABLE meeting_acceptance_counts (
  meeting_id uuid PRIMARY KEY,
  acceptance_count counter
)

We also need to know if we have counted this user for this meeting before because we don’t want to count the same user twice. Lets create another table, meeting_users.

1
2
3
4
5
CREATE TABLE meeting_users (
  meeting_id uuid,
  user_id uuid,
  PRIMARY KEY(meeting_id, user_id)
)

Internally to Cassandra the table looks a bit like this.

meeting_id partition key
user_id column name
CEC01088-2F5A-41E1-93D7-88AAD61B022F column value
1421668551 timestamp
ttl not used

So how do we set if not exists without using some kind of distributed lock? Another feature of the Cassandra INSERT statement is the ability to provide the value that will be used for the timestamp, instead of it being generated silently based on the current time. Given that the highest timestamp value wins, we want a value that gets smaller over time so that the first write always wins.

We can do this by using (long.MaxValue – current timestamp) as the timestamp value along with the INSERT request. Lets say we calculate this to be 9223372035433107000, then our INSERT will look like this.

1
INSERT INTO meeting_users (meeting_id, user_id) VALUES (379CCAB3-9773-44A7-8520-BF6E0D5CD56A, B874527A-DB0F-499C-BB5F-80C76DFBAAE1) USING TIMESTAMP 9223372035433107000;

Did my write win?

Now we need a way to know the result of that INSERT, did it already exist? This can be easily achieved using the WRITETIME function.

1
2
3
4
SELECT WRITETIME (user_id)
FROM meeting_users
WHERE meeting_id = 379CCAB3-9773-44A7-8520-BF6E0D5CD56A
AND user_id = B874527A-DB0F-499C-BB5F-80C76DFBAAE1

If this result matches the timestamp we provided (9223372035433107000), then this was the first write of that primary key and we can increment our counter. If it doesn’t match then the insert will effectively be ignored and the data will be dropped during compaction. Extra cost is incurred during this read to merge-out the subsequent writes until that happens.

Compaction

Compaction merges data and consolidates SSTables. To reduce the amount of read required during a query it’s probably a good idea to use the DateTieredCompactionStrategy on the unique set table (meeting_users in this example).

Feedback

If you have any opinion on why this is a terrible idea, I really love to hear it!

Comments