Doing Data Science


Cathy O'Neil - 2013
    But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know.In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.Topics include:Statistical inference, exploratory data analysis, and the data science processAlgorithmsSpam filters, Naive Bayes, and data wranglingLogistic regressionFinancial modelingRecommendation engines and causalityData visualizationSocial networks and data journalismData engineering, MapReduce, Pregel, and HadoopDoing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.

Infonomics: How to Monetize, Manage, and Measure Information as an Asset for Competitive Advantage


Douglas B. Laney - 2017
    They report to the board on the health of their workforce, their financials, their customers, and their partnerships, but rarely the health of their information assets. Corporations typically exhibit greater discipline in tracking and accounting for their office furniture than their data. Infonomics is the theory, study, and discipline of asserting economic significance to information. It strives to apply both economic and asset management principles and practices to the valuation, handling, and deployment of information assets. This book specifically shows: CEOs and business leaders how to more fully wield information as a corporate asset CIOs how to improve the flow and accessibility of information CFOs how to help their organizations measure the actual and latent value in their information assets. More directly, this book is for the burgeoning force of chief data officers (CDOs) and other information and analytics leaders in their valiant struggle to help their organizations become more infosavvy. Author Douglas Laney has spent years researching and developing Infonomics and advising organizations on the infinite opportunities to monetize, manage, and measure information. This book delivers a set of new ideas, frameworks, evidence, and even approaches adapted from other disciplines on how to administer, wield, and understand the value of information. Infonomics can help organizations not only to better develop, sell, and market their offerings, but to transform their organizations altogether.

Algorithms to Live By: The Computer Science of Human Decisions


Brian Christian - 2016
    What should we do, or leave undone, in a day or a lifetime? How much messiness should we accept? What balance of new activities and familiar favorites is the most fulfilling? These may seem like uniquely human quandaries, but they are not: computers, too, face the same constraints, so computer scientists have been grappling with their version of such issues for decades. And the solutions they've found have much to teach us.In a dazzlingly interdisciplinary work, acclaimed author Brian Christian and cognitive scientist Tom Griffiths show how the algorithms used by computers can also untangle very human questions. They explain how to have better hunches and when to leave things to chance, how to deal with overwhelming choices and how best to connect with others. From finding a spouse to finding a parking spot, from organizing one's inbox to understanding the workings of memory, Algorithms to Live By transforms the wisdom of computer science into strategies for human living.

Beautiful Code: Leading Programmers Explain How They Think


Andy OramLincoln Stein - 2007
    You will be able to look over the shoulder of major coding and design experts to see problems through their eyes.This is not simply another design patterns book, or another software engineering treatise on the right and wrong way to do things. The authors think aloud as they work through their project's architecture, the tradeoffs made in its construction, and when it was important to break rules. Beautiful Code is an opportunity for master coders to tell their story. All author royalties will be donated to Amnesty International.

Understanding Variation: The Key to Managing Chaos


Donald J. Wheeler - 1993
    But before numerical information can be useful it must be analyzed, interpreted, and assimilated. Unfortunately, teaching the techniques for making sense of data has been neglected at all levels of our educational system. As a result, through our culture there is little appreciation of how to effectively use the volumes of data generated by both business and government. This book can remedy that situation. Readers report that this book as changed both the way they look a data and the very form their monthly reports. It has turned arguments about the numbers into a common understanding of what needs to be done about them. These techniques and benefits have been thoroughly proven in a wide variety of settings. Read this book and use the techniques to gain the benefits for your company.

The Seven Pillars of Statistical Wisdom


Stephen M. Stigler - 2016
    It allows one to gain information by discarding information, namely, the individuality of the observations. Stigler s second pillar, information measurement, challenges the importance of big data by noting that observations are not all equally important: the amount of information in a data set is often proportional to only the square root of the number of observations, not the absolute number. The third idea is likelihood, the calibration of inferences with the use of probability. Intercomparison is the principle that statistical comparisons do not need to be made with respect to an external standard. The fifth pillar is regression, both a paradox (tall parents on average produce shorter children; tall children on average have shorter parents) and the basis of inference, including Bayesian inference and causal reasoning. The sixth concept captures the importance of experimental design for example, by recognizing the gains to be had from a combinatorial approach with rigorous randomization. The seventh idea is the residual the notion that a complicated phenomenon can be simplified by subtracting the effect of known causes, leaving a residual phenomenon that can be explained more easily.The Seven Pillars of Statistical Wisdom presents an original, unified account of statistical science that will fascinate the interested layperson and engage the professional statistician."

Electric Machinery Fundamentals


Stephen J. Chapman - 1991
    MATLAB has been incorporated throughtout, both in examples and problems.

I Heart Logs: Event Data, Stream Processing, and Data Integration


Jay Kreps - 2014
    Even though most engineers don't think much about them, this short book shows you why logs are worthy of your attention.Based on his popular blog posts, LinkedIn principal engineer Jay Kreps shows you how logs work in distributed systems, and then delivers practical applications of these concepts in a variety of common uses--data integration, enterprise architecture, real-time stream processing, data system design, and abstract computing models.Go ahead and take the plunge with logs; you're going love them.Learn how logs are used for programmatic access in databases and distributed systemsDiscover solutions to the huge data integration problem when more data of more varieties meet more systemsUnderstand why logs are at the heart of real-time stream processingLearn the role of a log in the internals of online data systemsExplore how Jay Kreps applies these ideas to his own work on data infrastructure systems at LinkedIn

Rebooting AI: Building Artificial Intelligence We Can Trust


Gary F. Marcus - 2019
    Professors Gary Marcus and Ernest Davis have spent their careers at the forefront of AI research and have witnessed some of the greatest milestones in the field, but they argue that a computer winning in games like Jeopardy and go does not signal that we are on the doorstep of fully autonomous cars or superintelligent machines. The achievements in the field thus far have occurred in closed systems with fixed sets of rules. These approaches are too narrow to achieve genuine intelligence. The world we live in is wildly complex and open-ended. How can we bridge this gap? What will the consequences be when we do? Marcus and Davis show us what we need to first accomplish before we get there and argue that if we are wise along the way, we won't need to worry about a future of machine overlords. If we heed their advice, humanity can create an AI that we can trust in our homes, our cars, and our doctor's offices. Reboot provides a lucid, clear-eyed assessment of the current science and offers an inspiring vision of what we can achieve and how AI can make our lives better.

NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence


Pramod J. Sadalage - 2012
    Advocates of NoSQL databases claim they can be used to build systems that are more performant, scale better, and are easier to program." ""NoSQL Distilled" is a concise but thorough introduction to this rapidly emerging technology. Pramod J. Sadalage and Martin Fowler explain how NoSQL databases work and the ways that they may be a superior alternative to a traditional RDBMS. The authors provide a fast-paced guide to the concepts you need to know in order to evaluate whether NoSQL databases are right for your needs and, if so, which technologies you should explore further. The first part of the book concentrates on core concepts, including schemaless data models, aggregates, new distribution models, the CAP theorem, and map-reduce. In the second part, the authors explore architectural and design issues associated with implementing NoSQL. They also present realistic use cases that demonstrate NoSQL databases at work and feature representative examples using Riak, MongoDB, Cassandra, and Neo4j. In addition, by drawing on Pramod Sadalage's pioneering work, "NoSQL Distilled" shows how to implement evolutionary design with schema migration: an essential technique for applying NoSQL databases. The book concludes by describing how NoSQL is ushering in a new age of Polyglot Persistence, where multiple data-storage worlds coexist, and architects can choose the technology best optimized for each type of data access.

Designing Event-Driven Systems


Ben Stopford - 2018
    Many of these patterns are successful by themselves, but as this practical ebook demonstrates, they provide a more holistic and compelling approach when applied together.Author Ben Stopford explains how service-based architectures and stream processing tools such as Apache Kafka® can help you build business-critical systems.* Learn why streaming beats request-response based architectures in complex, contemporary use cases* Understand why replayable logs such as Kafka provide a backbone for both service communication and shared datasets* Explore how event collaboration and event sourcing patterns increase safety and recoverability with functional, event-driven approaches* Apply patterns including Event Sourcing and CQRS, and how to build multi-team systems with microservices and SOA using patterns such as “inside out databases” and “event streams as a source of truth”* Build service ecosystems that blend event-driven and request-driven interfaces using a replayable log and Kafka's Streams API* Scale beyond individual teams into larger, department- and company-sized architectures, using event streams as a source of truth

Non-Invasive Data Governance: The Path of Least Resistance and Greatest Success


Robert Seiner - 2014
    Data Governance should not be about command-and-control, yet at times could become invasive or threatening to the work, people and culture of an organization. Non-Invasive Data Governance™ focuses on formalizing existing accountability for the management of data and improving formal communications, protection, and quality efforts through effective stewarding of data resources. Non-Invasive Data Governance will provide you with a complete set of tools to help you deliver a successful data governance program. Learn how: Steward responsibilities can be identified and recognized, formalized, and engaged according to their existing responsibility rather than being assigned or handed to people as more work. Governance of information can be applied to existing policies, standard operating procedures, practices, and methodologies, rather than being introduced or emphasized as new processes or methods. Governance of information can support all data integration, risk management, business intelligence and master data management activities rather than imposing inconsistent rigor to these initiatives. A practical and non-threatening approach can be applied to governing information and promoting stewardship of data as a cross-organization asset. Best practices and key concepts of this non-threatening approach can be communicated effectively to leverage strengths and address opportunities to improve.

Elements Of Electrical And Mechanical Engineering


B.L. Theraja - 1999
    

Database Internals: A deep-dive into how distributed data systems work


Alex Petrov - 2019
    But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals.Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You’ll discover that the most significant distinctions among many modern databases reside in subsystems that determine how storage is organized and how data is distributed.This book examines:Storage engines: Explore storage classification and taxonomy, and dive into B-Tree-based and immutable log structured storage engines, with differences and use-cases for eachDistributed systems: Learn step-by-step how nodes and processes connect and build complex communication patterns, from UDP to reliable consensus protocolsDatabase clusters: Discover how to achieve consistent models for replicated data

Clean Code: A Handbook of Agile Software Craftsmanship


Robert C. Martin - 2007
    But if code isn't clean, it can bring a development organization to its knees. Every year, countless hours and significant resources are lost because of poorly written code. But it doesn't have to be that way. Noted software expert Robert C. Martin presents a revolutionary paradigm with Clean Code: A Handbook of Agile Software Craftsmanship . Martin has teamed up with his colleagues from Object Mentor to distill their best agile practice of cleaning code on the fly into a book that will instill within you the values of a software craftsman and make you a better programmer but only if you work at it. What kind of work will you be doing? You'll be reading code - lots of code. And you will be challenged to think about what's right about that code, and what's wrong with it. More importantly, you will be challenged to reassess your professional values and your commitment to your craft. Clean Code is divided into three parts. The first describes the principles, patterns, and practices of writing clean code. The second part consists of several case studies of increasing complexity. Each case study is an exercise in cleaning up code - of transforming a code base that has some problems into one that is sound and efficient. The third part is the payoff: a single chapter containing a list of heuristics and "smells" gathered while creating the case studies. The result is a knowledge base that describes the way we think when we write, read, and clean code. Readers will come away from this book understanding ‣ How to tell the difference between good and bad code‣ How to write good code and how to transform bad code into good code‣ How to create good names, good functions, good objects, and good classes‣ How to format code for maximum readability ‣ How to implement complete error handling without obscuring code logic ‣ How to unit test and practice test-driven development This book is a must for any developer, software engineer, project manager, team lead, or systems analyst with an interest in producing better code.