Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists


Philipp K. Janert - 2010
    With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.Use graphics to describe data with one, two, or dozens of variablesDevelop conceptual models using back-of-the-envelope calculations, as well asscaling and probability argumentsMine data with computationally intensive methods such as simulation and clusteringMake your conclusions understandable through reports, dashboards, and other metrics programsUnderstand financial calculations, including the time-value of moneyUse dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situationsBecome familiar with different open source programming environments for data analysisFinally, a concise reference for understanding how to conquer piles of data.--Austin King, Senior Web Developer, MozillaAn indispensable text for aspiring data scientists.--Michael E. Driscoll, CEO/Founder, Dataspora

Sorting Things Out: Classification and Its Consequences


Geoffrey C. Bowker - 1999
    Bowker and Susan Leigh Star explore the role of categories and standards in shaping the modern world. In a clear and lively style, they investigate a variety of classification systems, including the International Classification of Diseases, the Nursing Interventions Classification, race classification under apartheid in South Africa, and the classification of viruses and of tuberculosis.The authors emphasize the role of invisibility in the process by which classification orders human interaction. They examine how categories are made and kept invisible, and how people can change this invisibility when necessary. They also explore systems of classification as part of the built information environment. Much as an urban historian would review highway permits and zoning decisions to tell a city's story, the authors review archives of classification design to understand how decisions have been made. Sorting Things Out has a moral agenda, for each standard and category valorizes some point of view and silences another. Standards and classifications produce advantage or suffering. Jobs are made and lost; some regions benefit at the expense of others. How these choices are made and how we think about that process are at the moral and political core of this work. The book is an important empirical source for understanding the building of information infrastructures.

Calling Bullshit: The Art of Skepticism in a Data-Driven World


Carl T. Bergstrom - 2020
    Now, two science professors give us the tools to dismantle misinformation and think clearly in a world of fake news and bad data.It's increasingly difficult to know what's true. Misinformation, disinformation, and fake news abound. Our media environment has become hyperpartisan. Science is conducted by press release. Startup culture elevates bullshit to high art. We are fairly well equipped to spot the sort of old-school bullshit that is based in fancy rhetoric and weasel words, but most of us don't feel qualified to challenge the avalanche of new-school bullshit presented in the language of math, science, or statistics. In Calling Bullshit, Professors Carl Bergstrom and Jevin West give us a set of powerful tools to cut through the most intimidating data.You don't need a lot of technical expertise to call out problems with data. Are the numbers or results too good or too dramatic to be true? Is the claim comparing like with like? Is it confirming your personal bias? Drawing on a deep well of expertise in statistics and computational biology, Bergstrom and West exuberantly unpack examples of selection bias and muddled data visualization, distinguish between correlation and causation, and examine the susceptibility of science to modern bullshit.We have always needed people who call bullshit when necessary, whether within a circle of friends, a community of scholars, or the citizenry of a nation. Now that bullshit has evolved, we need to relearn the art of skepticism.

Mining of Massive Datasets


Anand Rajaraman - 2011
    This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.

Probably Approximately Correct: Nature's Algorithms for Learning and Prospering in a Complex World


Leslie Valiant - 2013
    We nevertheless muddle through even in the absence of theories of how to act. But how do we do it?In Probably Approximately Correct, computer scientist Leslie Valiant presents a masterful synthesis of learning and evolution to show how both individually and collectively we not only survive, but prosper in a world as complex as our own. The key is “probably approximately correct” algorithms, a concept Valiant developed to explain how effective behavior can be learned. The model shows that pragmatically coping with a problem can provide a satisfactory solution in the absence of any theory of the problem. After all, finding a mate does not require a theory of mating. Valiant’s theory reveals the shared computational nature of evolution and learning, and sheds light on perennial questions such as nature versus nurture and the limits of artificial intelligence.Offering a powerful and elegant model that encompasses life’s complexity, Probably Approximately Correct has profound implications for how we think about behavior, cognition, biological evolution, and the possibilities and limits of human and machine intelligence.

The Algorithm Design Manual


Steven S. Skiena - 1997
    Drawing heavily on the author's own real-world experiences, the book stresses design and analysis. Coverage is divided into two parts, the first being a general guide to techniques for the design and analysis of computer algorithms. The second is a reference section, which includes a catalog of the 75 most important algorithmic problems. By browsing this catalog, readers can quickly identify what the problem they have encountered is called, what is known about it, and how they should proceed if they need to solve it. This book is ideal for the working professional who uses algorithms on a daily basis and has need for a handy reference. This work can also readily be used in an upper-division course or as a student reference guide. THE ALGORITHM DESIGN MANUAL comes with a CD-ROM that contains: * a complete hypertext version of the full printed book. * the source code and URLs for all cited implementations. * over 30 hours of audio lectures on the design and analysis of algorithms are provided, all keyed to on-line lecture notes.

Brotopia: Breaking Up the Boys' Club of Silicon Valley


Emily Chang - 2018
    It's a "Brotopia," where men hold all the cards and make all the rules. Vastly outnumbered, women face toxic workplaces rife with discrimination and sexual harassment, where investors take meetings in hot tubs and network at sex parties.In this powerful exposé, Bloomberg TV journalist Emily Chang reveals how Silicon Valley got so sexist despite its utopian ideals, why bro culture endures despite decades of companies claiming the moral high ground (Don't Be Evil! Connect the World!)--and how women are finally starting to speak out and fight back.Drawing on her deep network of Silicon Valley insiders, Chang opens the boardroom doors of male-dominated venture capital firms like Kleiner Perkins, the subject of Ellen Pao's high-profile gender discrimination lawsuit, and Sequoia, where a partner once famously said they "won't lower their standards" just to hire women. Interviews with Facebook COO Sheryl Sandberg, YouTube CEO Susan Wojcicki, and former Yahoo! CEO Marissa Mayer--who got their start at Google, where just one in five engineers is a woman--reveal just how hard it is to crack the Silicon Ceiling. And Chang shows how women such as former Uber engineer Susan Fowler, entrepreneur Niniane Wang, and game developer Brianna Wu, have risked their careers and sometimes their lives to pave a way for other women.Silicon Valley's aggressive, misogynistic, work-at-all costs culture has shut women out of the greatest wealth creation in the history of the world. It's time to break up the boys' club. Emily Chang shows us how to fix this toxic culture--to bring down Brotopia, once and for all.

The Sciences of the Artificial


Herbert A. Simon - 1969
    There are updates throughout the book as well. These take into account important advances in cognitive psychology and the science of design while confirming and extending the book's basic thesis: that a physical symbol system has the necessary and sufficient means for intelligent action. The chapter "Economic Reality" has also been revised to reflect a change in emphasis in Simon's thinking about the respective roles of organizations and markets in economic systems."People sometimes ask me what they should read to find out about artificial intelligence. Herbert Simon's book The Sciences of the Artificial is always on the list I give them. Every page issues a challenge to conventional thinking, and the layman who digests it well will certainly understand what the field of artificial intelligence hopes to accomplish. I recommend it in the same spirit that I recommend Freud to people who ask about psychoanalysis, or Piaget to those who ask about child psychology: If you want to learn about a subject, start by reading its founding fathers." -- George A. Miller

Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life


Albert-László Barabási - 2002
    Albert-László Barabási, the nation’s foremost expert in the new science of networks and author of Bursts, takes us on an intellectual adventure to prove that social networks, corporations, and living organisms are more similar than previously thought. Grasping a full understanding of network science will someday allow us to design blue-chip businesses, stop the outbreak of deadly diseases, and influence the exchange of ideas and information. Just as James Gleick and the Erdos–Rényi model brought the discovery of chaos theory to the general public, Linked tells the story of the true science of the future and of experiments in statistical mechanics on the internet, all vital parts of what would eventually be called the Barabási–Albert model.

Being Digital


Nicholas Negroponte - 1995
    Negroponte's fans will want to get a copy of Being Digital, which is an edited version of the 18 articles he wrote for Wired about "being digital." Negroponte's text is mostly a history of media technology rather than a set of predictions for future technologies. In the beginning, he describes the evolution of CD-ROMs, multimedia, hypermedia, HDTV (high-definition television), and more. The section on interfaces is informative, offering an up-to-date history on visual interfaces, graphics, virtual reality (VR), holograms, teleconferencing hardware, the mouse and touch-sensitive interfaces, and speech recognition. In the last chapter and the epilogue, Negroponte offers visionary insight on what "being digital" means for our future. Negroponte praises computers for their educational value but recognizes certain dangers of technological advances, such as increased software and data piracy and huge shifts in our job market that will require workers to transfer their skills to the digital medium. Overall, Being Digital provides an informative history of the rise of technology and some interesting predictions for its future.

Problem Solving with Algorithms and Data Structures Using Python


Bradley N. Miller - 2005
    It is also about Python. However, there is much more. The study of algorithms and data structures is central to understanding what computer science is all about. Learning computer science is not unlike learning any other type of difficult subject matter. The only way to be successful is through deliberate and incremental exposure to the fundamental ideas. A beginning computer scientist needs practice so that there is a thorough understanding before continuing on to the more complex parts of the curriculum. In addition, a beginner needs to be given the opportunity to be successful and gain confidence. This textbook is designed to serve as a text for a first course on data structures and algorithms, typically taught as the second course in the computer science curriculum. Even though the second course is considered more advanced than the first course, this book assumes you are beginners at this level. You may still be struggling with some of the basic ideas and skills from a first computer science course and yet be ready to further explore the discipline and continue to practice problem solving. We cover abstract data types and data structures, writing algorithms, and solving problems. We look at a number of data structures and solve classic problems that arise. The tools and techniques that you learn here will be applied over and over as you continue your study of computer science.

The Pragmatic Programmer: From Journeyman to Master


Andy Hunt - 1999
    It covers topics ranging from personal responsibility and career development to architectural techniques for keeping your code flexible and easy to adapt and reuse. Read this book, and you'll learn how toFight software rot; Avoid the trap of duplicating knowledge; Write flexible, dynamic, and adaptable code; Avoid programming by coincidence; Bullet-proof your code with contracts, assertions, and exceptions; Capture real requirements; Test ruthlessly and effectively; Delight your users; Build teams of pragmatic programmers; and Make your developments more precise with automation. Written as a series of self-contained sections and filled with entertaining anecdotes, thoughtful examples, and interesting analogies, The Pragmatic Programmer illustrates the best practices and major pitfalls of many different aspects of software development. Whether you're a new coder, an experienced programmer, or a manager responsible for software projects, use these lessons daily, and you'll quickly see improvements in personal productivity, accuracy, and job satisfaction. You'll learn skills and develop habits and attitudes that form the foundation for long-term success in your career. You'll become a Pragmatic Programmer.

The Sentient Machine: The Coming Age of Artificial Intelligence


Amir Husain - 2017
    Acclaimed technologist and inventor Amir Husain explains how we can live amidst the coming age of sentient machines and artificial intelligence—and not only survive, but thrive.Artificial “machine” intelligence is playing an ever-greater role in our society. We are already using cruise control in our cars, automatic checkout at the drugstore, and are unable to live without our smartphones. The discussion around AI is polarized; people think either machines will solve all problems for everyone, or they will lead us down a dark, dystopian path into total human irrelevance. Regardless of what you believe, the idea that we might bring forth intelligent creation can be intrinsically frightening. But what if our greatest role as humans so far is that of creators? Amir Husain, a brilliant inventor and computer scientist, argues that we are on the cusp of writing our next, and greatest, creation myth. It is the dawn of a new form of intellectual diversity, one that we need to embrace in order to advance the state of the art in many critical fields, including security, resource management, finance, and energy. “In The Sentient Machine, Husain prepares us for a brighter future; not with hyperbole about right and wrong, but with serious arguments about risk and potential” (Dr. Greg Hyslop, Chief Technology Officer, The Boeing Company). He addresses broad existential questions surrounding the coming of AI: Why are we valuable? What can we create in this world? How are we intelligent? What constitutes progress for us? And how might we fail to progress? Husain boils down complex computer science and AI concepts into clear, plainspoken language and draws from a wide variety of cultural and historical references to illustrate his points. Ultimately, Husain challenges many of our societal norms and upends assumptions we hold about “the good life.”

How to Solve It: A New Aspect of Mathematical Method


George Pólya - 1944
    Polya, How to Solve It will show anyone in any field how to think straight. In lucid and appealing prose, Polya reveals how the mathematical method of demonstrating a proof or finding an unknown can be of help in attacking any problem that can be reasoned out--from building a bridge to winning a game of anagrams. Generations of readers have relished Polya's deft--indeed, brilliant--instructions on stripping away irrelevancies and going straight to the heart of the problem.

The Second Self: Computers & the Human Spirit (20th Anniversary)


Sherry Turkle - 1984
    Technology, she writes, catalyzes changes not only in what we do but in how we think. First published in 1984, The Second Self is still essential reading as a primer in the psychology of computation. This twentieth anniversary edition allows us to reconsider two decades of computer culture--to (re)experience what was and is most novel in our new media culture and to view our own contemporary relationship with technology with fresh eyes. Turkle frames this classic work with a new introduction, a new epilogue, and extensive notes added to the original text.Turkle talks to children, college students, engineers, AI scientists, hackers, and personal computer owners--people confronting machines that seem to think and at the same time suggest a new way for us to think--about human thought, emotion, memory, and understanding. Her interviews reveal that we experience computers as being on the border between inanimate and animate, as both an extension of the self and part of the external world. Their special place betwixt and between traditional categories is part of what makes them compelling and evocative. (In the introduction to this edition, Turkle quotes a PDA user as saying, When my Palm crashed, it was like a death. I thought I had lost my mind.) Why we think of the workings of a machine in psychological terms--how this happens, and what it means for all of us--is the ever more timely subject of The Second Self.