Hadoop Usage at Last.fmLast.fm: The Social Music RevolutionHadoop at Last.fmGenerating Charts with HadoopThe Track Statistics ProgramCalculating the number of unique listenersUniqueListenerMapperUniqueListenersReducerSumming the track totalsSumMapperSumReducerMerging the resultsMergeListenersMapperIdentityMapperSumReducerSummaryHadoop and Hive at FacebookIntroductionHadoop at FacebookHistoryUse casesData architectureHadoop configurationHypothetical Use Case StudiesAdvertiser insights and performanceAd hoc analysis and product feedbackData analysisHiveOverviewData organizationQuery languageData pipelines using HiveProblems and Future WorkFair sharingSpace managementScribe-HDFS integrationImprovements to HiveNutch Search EngineBackgroundData StructuresCrawlDbLinkDbSegmentsSelected Examples of Hadoop Data Processing in NutchLink inversionGeneration of fetchlistsStep 1: Select, sort by score, limit by URL count per hostStep 2: Invert, partition by host, sort randomlyFetcher: A multithreaded MapRunner in actionIndexer: Using custom OutputFormatSummaryLog Processing at RackspaceRequirements/The ProblemLogsBrief HistoryChoosing HadoopCollection and StorageLog collectionLog storageMapReduce for LogsProcessingPhase 1: MapPhase 1: ReducePhase 2: MapPhase 2: ReduceMerging for near-term searchShardingSearch resultsArchiving for analysisCascadingFields, Tuples, and PipesOperationsTaps, Schemes, and FlowsCascading in PracticeFlexibilityHadoop and Cascading at ShareThisSummaryTeraByte Sort on Apache HadoopUsing Pig and Wukong to Explore Billion-edge Network GraphsMeasuring CommunityEverybody’s Talkin’ at Me: The Twitter Reply GraphEdge pairs versus adjacency listDegreeSymmetric LinksCommunity ExtractionGet neighborsCommunity metrics and the 1 million × 1 million problemLocal properties at global scale