Google Gemini

https://gemini.google.com/app/e4ee08b0fb835f40 12/25/2025, 4:21:06 PMgemini

Study Flashcards

15 Cards

Question

What was the main goal of Uber's improvements to their Experiment Evaluation Engine?

Answer

To make the engine 100x faster by reducing latency from 10ms (p99) to 100μs (p99).

Question

What were the three main pain points before the improvement?

Answer

1. High latency (10ms p99 RPC calls). 2. Reliability issues (single point of failure in parameter service). 3. Developer productivity (complex prefetch mechanisms).

Question

What is the core solution Uber implemented?

Answer

Switch from remote evaluation (RPC calls) to local evaluation (embedded SDK in microservices).

Question

How does local evaluation work?

Answer

Services use an in-memory SDK to compute experiment treatments locally using cached config data instead of network calls.

Question

What hashing algorithm does Uber use for consistent user bucketing?

Answer

MurmurHash3: Hashes user_id + experiment_id, then uses modulo to assign to buckets (e.g., 0-4% group A).

Question

What is Flipr?

Answer

Uber's configuration management system that pushes experiment definitions to local caches in microservices via push/pull hybrid (long-polling or notifications).

Question

What is Shadow Evaluation?

Answer

Runs both old (remote) and new (local) engines in parallel, compares results, ensures >99.999% match before full rollout.

Question

How does Uber handle log bloat from faster evaluations?

Answer

Deduplication (log only first or aggregated per user/session), asynchronous logging via Kafka.

Question

What type of consistency does Uber accept for config propagation?

Answer

Eventual consistency (few seconds delay), prioritizing availability and low latency over strict consistency.

Question

How does Uber ensure session consistency across services?

Answer

Propagate treatment decisions in request headers or context, so downstream services use passed values instead of recomputing.

Question

What is 'Blast Radius' control in Uber's rollout?

Answer

Staged rollout (per cluster/region), monitor error rates, automatic rollback if issues detected.

Question

Why is local evaluation faster?

Answer

Eliminates network latency of RPC calls; computations happen in RAM using pre-loaded rules.

Question

What data structure optimizations were made for memory?

Answer

Compact structures for experiment definitions to minimize RAM usage in microservices.

Question

How long does config propagation typically take?

Answer

A few seconds globally via Flipr's push mechanism to edge nodes/sidecars.

Question

What business impacts did the change have?

Answer

Scale to thousands of experiments, smoother user experience (e.g., pricing, matching).