Today as part of ACM Chennai Chapter Lectures Dr.David Chaiken, Chief Architect at Yahoo! Inc. delivered a talk on Architecture at internet-scale at the ICSR Auditorium of IIT Madras.

  • He started by sharing data points about Yahoo!: 640+ Million users, 4.5 Billion Pageviews per day, 368M User visits/Month. Yahoo! Mail has over 450M Mailboxes and does over 5B+ Deliverables/Day
  • The talk was structured into six parts:
    • Science, Art & Scale
    • Competing needs: Agility & Stability
    • Cloud Infrastructure
    • Content Platform
    • Advertisement Platform
    • Data Center Innovation (like the new New York Datacenter of Yahoo!)
  • He defined availability as below: (Definitions: MTTF, MTTD, MTTR)

MTTF/(MTTF+MTTD+MTTR)

  • He talked about an incident in late 2008 that affected 1% more of users in their advertisement platforms even after extensive testing & deployment. Their failsafe mechanisms failed too. The moral is that “if you don’t test your failsafe regularly, they don’t work”. They had a Byzantine failure where your infrastructure itself becomes your enemy. This reinforces what you learn in Computer science classes on Exponential Algorithms for common time
  • Science can help even in simple search results to take them beyond ten blue links. For example, see the integrated Cricket Information web part that shows up when you search for MS Dhoni in Yahoo! search
  • Yahoo! researchers and scientists contribute more to consumer products and teams than any other new media company
  • In about 10 Milliseconds Yahoo’s backend systems gather any stored information about an incoming user based on Cookie, ID or Mobile Number
(They have two Hadoop clusters for Yahoo! Homepage – Science and Production)

(They have two Hadoop clusters for Yahoo! Homepage – Science and Production)

(The low latency path bypasses Hadoop! grid and is for quickly updating content like stock, scores)

(The low latency path bypasses Hadoop! grid and is for quickly updating content like stock, scores)

  • Summary – in Yahoo! Hadoop is the standard for doing async & batch processing tasks. Over the years he expects Hadoop to gain near real-time update capabilities as well. Yahoo! contributes nearly 70%+ to Hadoop project and Yahoo’s Cloud infrastructure including Hadoop is completely open source.

Categorized in:

Tagged in: