Welcome
user_choices_background_image
Welcome
login container bottom
Search Libraries Catalogue
Duplicate Items
Add to My List

Print
Sorts and Limits


Title: Doing data science / Rachel Schutt and Cathy O'Neil.
Main Entry: O'Neil, Cathy, author.
Schutt, Rachel, 1976- joint auther

Publisher: O'Reilly Media,
Publication Date: 2014.
Publication Place: Beijing :
ISBN: 1449358659
9781449358655
9781449358631 (ebook)

Subject: Big data.
Data mining.
Information science.
Data structures (Computer Science)
Database management.
Cyberinfrastructure.
Computers and IT.
Big data.
Cyberinfrastructure.
Data mining.
Data structures (Computer science)
Database management.
Information science.
Datenanalyse
Big Data
Computers and IT.

Edition: First edition.
Contents: Big Data and Data Science Hype -- Getting Past the Hype -- Why Now? -- Datafication -- The Current Landscape (with a Little History) -- Data Science Jobs -- A Data Science Profile -- Thought Experiment: Meta-Definition -- OK, So What Is a Data Scientist, Really? -- In Academia -- In Industry -- Statistical Thinking in the Age of Big Data -- Statistical Inference -- Populations and Samples -- Populations and Samples of Big Data -- Big Data Can Mean Big Assumptions -- Modeling -- Exploratory Data Analysis -- Philosophy of Exploratory Data Analysis -- Exercise: EDA -- The Data Science Process -- A Data Scientist's Role in This Process -- Thought Experiment: How Would You Simulate Chaos? -- Case Study: RealDirect -- How Does RealDirect Make Money? -- Exercise: RealDirect Data Strategy -- Machine Learning Algorithms -- Three Basic Algorithms -- Linear Regression -- k-Nearest Neighbors (k-NN) -- k-means -- Exercise: Basic Machine Learning Algorithms.
Thought Experiment -- Financial Modeling -- In-Sample, Out-of-Sample, and Causality -- Preparing Financial Data -- Log Returns -- Example: The S & P Index -- Working out a Volatility Measurement -- Exponential Downweighting -- The Financial Modeling Feedback Loop -- Why Regression? -- Adding Priors -- A Baby Model -- Exercise: GetGlue and Timestamped Event Data -- Exercise: Financial Data -- William Cukierski -- Background: Data Science Competitions -- Background: Crowdsourcing -- The Kaggle Model -- A Single Contestant -- Their Customers -- Thought Experiment: What Are the Ethical Implications of a Robo-Grader? -- Feature Selection -- Example: User Retention -- Filters -- Wrappers -- Embedded Methods: Decision Trees -- Entropy -- The Decision Tree Algorithm -- Handling Continuous Variables in Decision Trees -- Random Forests -- User Retention: Interpretability Versus Predictive Power -- David Huffaker: Google's Hybrid Approach to Social Research -- Moving from Descriptive to Predictive -- Social at Google.
Privacy -- Thought Experiment: What Is the Best Way to Decrease Concern and Increase Understanding and Control? -- A Real-World Recommendation Engine -- Nearest Neighbor Algorithm Review -- Some Problems with Nearest Neighbors -- Beyond Nearest Neighbor: Machine Learning Classification -- The Dimensionality Problem -- Singular Value Decomposition (SVD) -- Important Properties of SVD -- Principal Component Analysis (PCA) -- Alternating Least Squares -- Fix V and Update U -- Last Thoughts on These Algorithms -- Thought Experiment: Filter Bubbles -- Exercise: Build Your Own Recommendation System -- Sample Code in Python -- Data Visualization History -- Gabriel Tarde -- Mark's Thought Experiment -- What Is Data Science, Redux? -- Processing -- Franco Moretti -- A Sample of Data Visualization Projects -- Mark's Data Visualization Projects -- New York Times Lobby: Moveable Type -- Project Cascade: Lives on a Screen -- Cronkite Plaza -- eBay Transactions and Books -- Public Theater Shakespeare Machine -- Goals of These Exhibits.
Data Science and Risk -- About Square -- The Risk Challenge -- The Trouble with Performance Estimation -- Model Building Tips -- Data Visualization at Square -- Ian's Thought Experiment -- Data Visualization for the Rest of Us -- Data Visualization Exercise -- Social Network Analysis at Morning Analytics -- Case-Attribute Data versus Social Network Data -- Social Network Analysis -- Terminology from Social Networks -- Centrality Measures -- The Industry of Centrality Measures -- Thought Experiment -- Morningside Analytics -- How Visualizations Help Us Find Schools of Fish -- More Background on Social Network Analysis from a Statistical Point of View -- Representations of Networks and Eigenvalue Centrality -- A First Example of Random Graphs: The Erdos-Renyi Model -- A Second Example of Random Graphs: The Exponential Random Graph Model -- Data Journalism -- A Bit of History on Data Journalism -- Writing Technical Journalism: Advice from an Expert -- Correlation Doesn't Imply Causation -- Asking Causal Questions.
Confounders: A Dating Example -- OK Cupid's Attempt -- The Gold Standard: Randomized Clinical Trials -- A/B Tests -- Second Best: Observational Studies -- Simpson's Paradox -- The Rubin Causal Model -- Visualizing Causality -- Definition: The Causal Effect -- Three Pieces of Advice -- Madigan's Background -- Thought Experiment -- Modern Academic Statistics -- Medical Literature and Observational Studies -- Stratification Does Not Solve the Confounder Problem -- What Do People Do About Confounding Things in Practice? -- Is There a Better Way? -- Research Experiment (Observational Medical Outcomes Partnership) -- Closing Thought Experiment -- Claudia's Data Scientist Profile -- The Life of a Chief Data Scientist -- On Being a Female Data Scientist -- Data Mining Competitions -- How to Be a Good Modeler -- Data Leakage -- Market Predictions -- Amazon Case Study: Big Spenders -- A Jewelry Sampling Problem -- IBM Customer Targeting -- Breast Cancer Detection -- Pneumonia Prediction -- How to Avoid Leakage.
Evaluating Models -- Accuracy: Meh -- Probabilities Matter, Not 0s and 1s -- Choosing an Algorithm -- A Final Example -- Parting Thoughts -- About David Crawshaw -- Thought Experiment -- MapReduce -- Word Frequency Problem -- Enter MapReduce -- Other Examples of MapReduce -- What Can't MapReduce Do? -- Pregel -- About Josh Wills -- Thought Experiment -- On Being a Data Scientist -- Data Abundance Versus Data Scarcity -- Designing Models -- Economic Interlude: Hadoop -- A Brief Introduction to Hadoop -- Cloudera -- Back to Josh: Workflow -- So How to Get Started with Hadoop? -- Process Thinking -- Naive No Longer -- Helping Hands -- Your Mileage May Vary -- Bridging Tunnels -- Some of Our Work -- What Just Happened? -- What Is Data Science (Again)? -- What Are Next-Gen Data Scientists? -- Being Problem Solvers -- Cultivating Soft Skills -- Being Question Askers -- Being an Ethical Data Scientist -- Career Advice.

Cover Image: http://images.amazon.com/images/P/1449358659.jpg
http://images.amazon.com/images/P/9781449358631 (ebook).jpg
http://images.amazon.com/images/P/9781449358655.jpg


Results 1 - 1 of 1
  Agency: Collection: Call No.: Item Type: Status: Copy: Barcode: Media Type:
Balqa General 006.3 N 4426 Normal Circulation Available 1 BU201018458 Book