Day three of Qcon Conference
Day three of Qcon Conference were focus on NoSQL. The afternoon's session are less interesting, and I did not find good insight from them. Below are details of the tracks
Booths
Formed in 2005, Alderbaran is the result of a dream: build humanoid robots, a new humane species for the benefit of humankind. I saw the demo of leading robot in industry "NAO". The robust have been sold more than 5000 with a price of $8000 and $6000 for developer. NAO has:
- 25 degrees of freedom, for movement
- Two cameras, to see its surroundings
- An inertial measurement unit, which lets him know whether he is upright or sitting down
- Touch sensors to detect your touch
- Four directional microphones, so he can hear you
This combination of technologies (and many others) gives NAO the ability to detect its surroundings. The presenter said that they want NAO as not only a robot, but a member of your family and they want him can be more social and have intelligence to understand people more. The company do both software and hardware on their own.
Crittercism is the world’s first mobile application performance management (APM) solution, offering both error monitoring and service monitoring solutions. Crittercism products monitor every aspect of mobile app performance, allowing Developers and IT Operations to deliver high performing, highly reliable, highly available mobile apps. Crittercism offers a real-time global view of app diagnostics and app errors across iOS, Android, Windows Phone 8, Hybrid and HTML5 apps monitoring 1 billion monthly active users
New Relic is a software analytics company that makes sense of billions of metrics about millions of applications in real time. New Relic’s
comprehensive SaaS-based solution provides one powerful interface for web and native mobile applications and consolidates the performance
monitoring data for any chosen technology in an environment. Thousands of active accounts use New Relic’s cloud solution every day
to optimize more than 200 billion metrics across millions of applications. New Relic is pioneering a new category called Software Analytics.
Keynote
NoSQL like there no Tomorrow from Amazon
Khawaja Shams Head of Engineering for NoSQL, Amazon & Swami Sivasubramanian General Manager for NoSQL, Amazon share experience and suggestion about how do you build your own DynamoDB Scale services, and actually they focused on How we run No-SQL database in Amazon.
In early days, Amazon had 1000s apps managed its states in RDBMSs, RDMSs are actual kind of cool since it provide a simple T-SQL Language. However managing so many RDBMSs are very painful. In the past Q4 (holiday season) was hard-work at Amazon, we have to prepare be ready for Q4 for several months, and sometime we need to repartition databases.
Then, the cool Dynamo project was born (note, Amazon's Dynamo paper which use many cool techniques to build an eventual consistent key value store had big impact on many of NoSQL database systems.) Dynamo is a key value store which aims to have higher availability (5 9s with eventual consistency) and incremental scalability with no more repartitioning 2) no need to architect apps for peak 3) just add boxes to scale 4)
simpler querying model --> predictable performance.
But dynamo was not perfect. Scaling was easier, running is hard (install, benchmarking and management it) Have a big learning curve and it did not succeed. the main reason is that it is a software, but not a service.
In 2012, Amazon announcement DynamoDB, a totally new cloud only a NoSQL database service which as
- Seamless scalability
- Each administration
- Fast & predictable performance
Which is quite successfully. The way Amazon to plan to create a service is that you need to 1) pretend if you are launching a service, 2) set the goal and customer 3) reviewed by architect 4) make sure it make sense for customer.
Sacred Tenets in Servers
- Don't compromise durability for performance
- Plan for success -plan for scalability
- Plan for failure -> fault -tolerance is key
- Consistent performance is important
- Design -> think of blast radius
- Insist on correctness
About Testing:
Testing is a lifelong journey, but testing is not enough. Four way of ensure quality
- Onebox testing
- Gamma:
Simulate real workload
- Phased deployment
- Monitor:
does it still work?
Measuring customer experience is the key, don't only measure average, also 90%, 99% percentile.
The Game of Big Data analytic by Randy Shou
This is a must watch talk if you are building a data pipeline for your team. If you are in Xbox or Gaming area, this is also interesting for you. Randy was CTO at KIXEYE, former Dir. Google App Engine and eBay Chief Engineer. He discussed in details how he design a near real-time data pipeline in month with some well-known techniques and provide high extensibility and scalability and reliability.
Part 1. Goals for data analytic project
KIXYE is a free-to-play real-time strategy game company, it has >500 employees. The main platforms are both Web and mobile. There are three high level goal for the data analytic project to help company grow:
- Acquiring more user, at the same time to make sure user's estimated lifetime value is more that the cost to acquire that user. This will be achieved by 1) Publisher campaign 2) On-platform recommendations
- Game analytics: Measure and optimized "Fun" by define and measure many metrics, such as gameplays, features, etc, but all metrics are just proxy for fun. This was achieved by Game balance, Match balance (two players are equally match), Economy management, Player typology, etc.
- Retention and Monetization: Sustainable Business, Monetization drivers and Revenue recognition. This will be achieved by Pricing and bundling; Tournament ("Event") Design; Recommendations
Part 2 "Deep Thought" , the new system
As you can see above, building such a single telemetry system to achieve many goal is pretty challenge. In this section, Randy discuss in design principles for their new system, called "Deep Thought". Before describing the new system, he mentioned about their v1 analytic system (old system):
Grow organically, build for the goal of user acquisition, progressively go down to much more requirement
Idiosyncratic mix of language, systems, tools
- Log files -> Chukwa (from Yahoo) -> Hadoop ->Hive->MySQL
- Php for reports and ETL
- Single massive table with everything
And there are any issues
- Slow queries
- No data standardization
- Very difficult to add new games, report, ETL
- Difficult to backfill the system on error or outage
- Difficult for analysis to use, impossible for PMs designers, etc.
Then, he mentioned the high level Goals of "Deep Thought:
- Independent scalability: logically separate services components, independently scale tiers,
- Stability and Outage recovery (NOTE: I mean it is very important that you can replay to catch up the missing datas)
- Tiers can completely failed with no data loses
- Every step idempotent and replayable
- Standardization: standardized event types, fields, queries, reports. (NOTE, if you know Xbox team's telemetry system, you will know how important, standardization is important)
- In-stream event processing: Sessionlization, Dimensionalization, Chortling (see below for details)
- Querability
- Structure are simple to reason about
- Simple things are simple
- Analysis, data sciences, PM, Game designers, etc
- Extensibility
- East to add new games, events, fields, reports
Core capabilities
- Sessionlization: all events are part of a "session"
Explicit start event, optional stop event
Game-defined semantics
- Event Batching:
Event arrived in batch, associated with session
Pipeline computed batch level metric
Disaggregates events
Can optional attach batch-level metrics to each event
- Time-series aggregations
- Configurable metrics
- Accumulated in-stream
- Faster to calculate in-stream comparing with Map-Reduce
-
-
- 1 day X, 7 day X, lifetime X
- Total attacks, total time playyed
-
-
-
- V1 aggregate + batch delta
-
- Dimensionalization
Pipeline assign a unique numeric id to string enum
- E.gs. "Twigs" resource => ID 1234
System handle automatically (no need to change the way how to log)
- Games log strings
- Pipeline generates and maps id
- No configuration necessary
Fast dimensional queries
- Join on integer instead of string
- Metadata emulation and manipulate
Easily to enumerate all values for a field
Merge multiple values (easy to correct data quality issues)
TWIGS == twigs -- Twigs
Metadata Tagging:
Can assign arbitrary tags to metadata
E.g. "Panzer 05" is {tank, infantry, event prized}
Enables customer views
- Cohorting:
Group players along any dimension/metrics
Well beyond classis age-based cohorted
Core analytical building blocks
Experiment groups
User acquisition campaign tracking
Prospective modeling
Retrospective analytic
Set-base Cohorting:
Overlapping groups: >100, >300, etc
Exclusive groups: (100-200)
Time-based
- People who played in last days
- E.g :whale" --> ($$ >X) in last N days
- Auto expire from a group without explicit intervention
Part 3 Implementation
Randy discussed the techniques used for the implementation. I did not record all technique details. He made a couple of points after this session:
- They invite consultions recommended by Horton Works to the company to guide the implementation and also quickly trained employees to be familiar with new techniques
- He mentioned that he will evaluate Amazon S3 storage and Spark (very popular Hadoop Real-time analytic engine ). I have a sense that because he made the whole system loose couple and very to be extended so that it will be much easy to evaluate new techniques
- Adding new techniques are pain, but the whole company are begging him to create such new pipeline and the team even afford to delay commitment to customer for a couple of month to make new pipeline online
Ingestion: Logging Service
HTTP/JSON Endpoint
Play framework
Non-blocking, event driven
Reasonability
Message integrality via checksum
Durability via local disk persistence
Async batch writes to Kafka topics
- Separated different type of events: {valid, invalid, unauth} to different queue
Event Log: Kafka
Persistent, replayable pipe of event (Kafka was mentioned many times in QCON, I will provide some context about other teams inside Microsoft who is using Kafka in production)
- Events stored for 7 days
Responsibility:
- Durability via replication and local disk stream
- Replayability via commit logs
- Scalability via partitions brokers
- Segment data for different type of processing
Transformation
Exporter:
Consume Kafka topic, rebroadcast
E.g consume batches, rebroadcast events
Responsibility:
- Bach validation against JSON schema
- Syntactic validation
- Semantic validation
- Batches -> Events
Importer:
Session
Dimenatioanziation
Player metadata
Cohorting
Session Store
Key-value stores (Counchbase)
Fast constant-time access to session, player
Responsibility
Store session, players, dimensions, config
Lookup
Idempotent update
Store accumulated session level metrics
Store player history
Use replay to get metric
Data Storage
- Using Camus Map Reduce to generate Append_eventstable from Kafka to HDFS every 3 minute
- Append-only log of events
- Each events has session-version for deduplication (remember the event's history)
- Append_events -> base_events using Map Reduce
- Logical update of base_event (since Hadoop has no physical update)
- Replayable from beginning with no data duplication
- Base_events:
- De-normalized table of all events
- Stores original JSON + decoration
- Customer Serdes to query/extract JSON filed without materializing entire rows
- Standardize event types => lots of functionality for free
-
-
- Update evens with new metadata
- By Swap old partition for new partitions
-
Analytic and Visualization
Hive Warehouse
Normalized event-specific, game-specific stores
Aggregate metric data for report, analytic
Maintained thought customer RTC
Amazon Redshift:
Fast- ad-hoc queuing
Tableau:
Simple, powerful report
Hadoop for analytic workloads
Marcel Kornacker from Cloudera talks about recently changed in their Hadoop platform and provide benchmark and user studies about how people use Hadoop for analytic workloads, i.e, mostly interactive, adhoc queries with high response time requirement. I think it is mostly interesting for DBMS guys and Hadoop/HDInsight teams.
HDFS as a storage for Analytic Warehouse:
The core Hadoop HDFS storage system have a couple of performance improvement :
- High-performance data scan and for disk/memory
- Co-partitioned tables for distributed join
Affinity groups: collocated blocks from different files in the same machine
-> create co-partition
- Temp-FS write temp data in memory, bypassing disk
-> idea for iterative interactive data analysis
- Shot-circuit read: by pass data node protocol when reading from disk read at 100Mb per second
HDFS caches access explicitly cached data w/o copy or checksum
- access memory -resident data at memory bus speed (35G per second)
- enable in-memory processing
Parquet Columnar Storage for Hadoop
This is the new storage format. People who familiar with ColumStore might knew that by storing data column by column can reduce data significantly. It is a Open source Apache project and it is available for most of the engines, such as Impala, Hive, Pig, MR, Cascading. It was co-created by Twitter and Cloudera (it is amazing that companies are working together), hosted on Github. With contributor form Criteo, stripe, Berkley, Linked-in, and it is used in production at Twitter and Criteo.
Optimized storage of nested data structures, pattern after Dremels' columnIO format (good for twitter's user case)
Extensive set of column encodings:
- RLE, dictionary encoding 1.2
- Delta and optimized string encodings in 2.0
- Embedded statistics: version 2.0 stores in-lined column statistics for further optimization of scan efficiency
In a demo with TPC-H 1T data, the uncompressed file was 750Gb , with Parquet and snappy, the data was only 250G. Note, they did not use any Columnar storage query processing techniques (we used a lot in SQL Server).
Impala: A modern, Open-Source SQL Engine
MPP SQL query engine for the Hadoop environment (something, our Cosmo team might be interesting, I think we are ahead of them for years) . Version 1.3 available for CDH4 and CDH5.
It is design for performance: C++, and maintains Hadoop flexibility by standard Hadoop components (HDFS, Hbase, Metastore, Yarn). It play well with traditional BI tools exposes, ANSI SQL 95.Run standard SQL. Execution engine designed for efficiency, circumvents Map Reduce completely. It is Cloudera and open sourced.
In-memory execution
- Aggregation results and right-handle side input of join are cached
Run-time code generate
Use llvm to jit-complie the runtime intensive part of query
Effect the same as cusotm0cide query
Remove branch
Inline functional call
Impala from The user's perspective
- Create tables as virtual views over data in HDFS or Hbase
- Schema metadata is stored in Metastore (shared with hive, pig, etc)
- Connection vis odbc/jdbc
Road map
Windows function
Nested data structures (many company need this)
User cases
There are a couple of user cases. And during the talk and offline, people asked comparing with Vertica and Spark.
- Fraud detection from finance industry, switch from DB2 to Cloudera with 300TB, 52 node cluster, daily 2TB per day, overall $30M saving, storage cost went from $7K/TB to $0.5K/TB
- Media match: leading demand-side platform,
Data aggregation in 30 minute interval for reports
Evaluated Hive/Pig: 6 hour latency, not satisfied
Decide to use CDB/Impala replaced Netzza
- Allstate insurance:
Interactive data exploration with Tableau
Test case: 1.1B rows , 800 columns, 2.3T as CSV
35-node
Simple Group by query: 120 seconds
Scale Gilt with Micro Services
Lessons and challenges that have/had with micro-service architecture. Gilt was founded in 2007. Top 50 Internet-retailer in flash sales business. It has ~150 Engineers. In the early days, the team was using Ruby on Rails with Postgres and Memcache. Then as business grows, they had a couple of technique pain:
- Spike required to 1,000s of ruby processes
- Postgres was overloaded
- Routing traffic between ruby processes sucked.
Then, three things happened to resolve 90% of the scale issues
- Transition to the JVM
- M(a/i)cro-service era started
- Dedicated data stores for each service
- Separate data into different stores with different storage tools, such as H2 and Voldermont
But the team still had pain points in the development cycles:
- 1000 Models/controllers, 200K LOC, 100s of jobs
- Lot of contributors + no ownership
- Difficulty deployment with long integration cycles
- Hard to identify root causes.
So the team adopt micro-services with following goals
- Empower team wand ownership
- Smaller scope
- Simpler and easier deployments and rollback.
Right now, they had ~400 services in production and begin to transition to Scala and Play. The presenter shows an demo where each interaction in their website is a call for micro-service. The micro-service is defined as a functionality scope tie to small number of developers.
The new challenge came out from the micro-services:
- Development and testing are hard.
- Lack of dev/integration environment
The hardware is not strong enough
- No one want to compile 20 services
- The ownership is still lack
- Monitoring many services are hard.
- Testing is hard:
- Hard to execute functional tests between services
- Frustrating to deploy semi-manually (Capistrano)
- Scary to deploy other team services
The presenter described a couple of solutions to the above problem which I did not get in
Details due to lack of technique knowledge. Following are a couple of key points which I think it is important
- The team adopted Go Reactive (docker): an extension to Linux Containers (LXC)
- Decentralization
- Simple configurations
- Much lighter than a VM
- Code Owner Ship: Code will stay more than the people in the company
- You should build tools in 2014