Hadoop Explained
Aravind Shenoy - 2014
Hadoop allowed small and medium sized companies to store huge amounts of data on cheap commodity servers in racks. The introduction of Big Data has allowed businesses to make decisions based on quantifiable analysis. Hadoop is now implemented in major organizations such as Amazon, IBM, Cloudera, and Dell to name a few. This book introduces you to Hadoop and to concepts such as ‘MapReduce’, ‘Rack Awareness’, ‘Yarn’ and ‘HDFS Federation’, which will help you get acquainted with the technology.
Bayes Theorem: A Visual Introduction For Beginners
Dan Morris - 2016
Bayesian statistics is taught in most first-year statistics classes across the nation, but there is one major problem that many students (and others who are interested in the theorem) face. The theorem is not intuitive for most people, and understanding how it works can be a challenge, especially because it is often taught without visual aids. In this guide, we unpack the various components of the theorem and provide a basic overview of how it works - and with illustrations to help. Three scenarios - the flu, breathalyzer tests, and peacekeeping - are used throughout the booklet to teach how problems involving Bayes Theorem can be approached and solved. Over 60 hand-drawn visuals are included throughout to help you work through each problem as you learn by example. The illustrations are simple, hand-drawn, and in black and white. For those interested, we have also included sections typically not found in other beginner guides to Bayes Rule. These include: A short tutorial on how to understand problem scenarios and find P(B), P(A), and P(B|A). For many people, knowing how to approach scenarios and break them apart can be daunting. In this booklet, we provide a quick step-by-step reference on how to confidently understand scenarios.A few examples of how to think like a Bayesian in everyday life. Bayes Rule might seem somewhat abstract, but it can be applied to many areas of life and help you make better decisions. It is a great tool that can help you with critical thinking, problem-solving, and dealing with the gray areas of life. A concise history of Bayes Rule. Bayes Theorem has a fascinating 200+ year history, and we have summed it up for you in this booklet. From its discovery in the 1700’s to its being used to break the German’s Enigma Code during World War 2, its tale is quite phenomenal.Fascinating real-life stories on how Bayes formula is used in everyday life.From search and rescue to spam filtering and driverless cars, Bayes is used in many areas of modern day life. We have summed up 3 examples for you and provided an example of how Bayes could be used.An expanded definitions, notations, and proof section.We have included an expanded definitions and notations sections at the end of the booklet. In this section we define core terms more concretely, and also cover additional terms you might be confused about. A recommended readings section.From The Theory That Would Not Die to a few other books, there are a number of recommendations we have for further reading. Take a look! If you are a visual learner and like to learn by example, this intuitive booklet might be a good fit for you. Bayesian statistics is an incredibly fascinating topic and likely touches your life every single day. It is a very important tool that is used in data analysis throughout a wide-range of industries - so take an easy dive into the theorem for yourself with a visual approach!If you are looking for a short beginners guide packed with visual examples, this booklet is for you.
Data Driven
D.J. Patil - 2015
It requires you to develop a data culture that involves people throughout the organization. In this O’Reilly report, DJ Patil and Hilary Mason outline the steps you need to take if your company is to be truly data-driven—including the questions you should ask and the methods you should adopt.
You’ll not only learn examples of how Google, LinkedIn, and Facebook use their data, but also how Walmart, UPS, and other organizations took advantage of this resource long before the advent of Big Data. No matter how you approach it, building a data culture is the key to success in the 21st century.
You’ll explore:
Data scientist skills—and why every company needs a Spock
How the benefits of giving company-wide access to data outweigh the costs
Why data-driven organizations use the scientific method to explore and solve data problems
Key questions to help you develop a research-specific process for tackling important issues
What to consider when assembling your data team
Developing processes to keep your data team (and company) engaged
Choosing technologies that are powerful, support teamwork, and easy to use and learn
Security Metrics: Replacing Fear, Uncertainty, and Doubt
Andrew Jaquith - 2007
Using sample charts, graphics, case studies, and war stories, Yankee Group Security Expert Andrew Jaquith demonstrates exactly how to establish effective metrics based on your organization's unique requirements. You'll discover how to quantify hard-to-measure security activities, compile and analyze all relevant data, identify strengths and weaknesses, set cost-effective priorities for improvement, and craft compelling messages for senior management. Security Metrics successfully bridges management's quantitative viewpoint with the nuts-and-bolts approach typically taken by security professionals. It brings together expert solutions drawn from Jaquith's extensive consulting work in the software, aerospace, and financial services industries, including new metrics presented nowhere else. You'll learn how to: - Replace nonstop crisis response with a systematic approach to security improvement - Understand the differences between "good" and "bad" metrics - Measure coverage and control, vulnerability management, password quality, patch latency, benchmark scoring, and business-adjusted risk - Quantify the effectiveness of security acquisition, implementation, and other program activities - Organize, aggregate, and analyze your data to bring out key insights - Use visualization to understand and communicate security issues more clearly - Capture valuable data from firewalls and antivirus logs, third-party auditor reports, and other resources - Implement balanced scorecards that present compact, holistic views of organizational security effectiveness Whether you're an engineer or consultant responsible for security and reporting to management-or an executive who needs better information for decision-making-Security Metrics is the resource you have been searching for. Andrew Jaquith, program manager for Yankee Group's Security Solutions and Services Decision Service, advises enterprise clients on prioritizing and managing security resources. He also helps security vendors develop product, service, and go-to-market strategies for reaching enterprise customers. He co-founded @stake, Inc., a security consulting pioneer acquired by Symantec Corporation in 2004. His application security and metrics research has been featured in CIO, CSO, InformationWeek, IEEE Security and Privacy, and The Economist. Foreword Preface Acknowledgments About the Author Chapter 1 Introduction: Escaping the Hamster Wheel of Pain Chapter 2 Defining Security Metrics Chapter 3 Diagnosing Problems and Measuring Technical Security Chapter 4 Measuring Program Effectiveness Chapter 5 Analysis Techniques Chapter 6 Visualization Chapter 7 Automating Metrics Calculations Chapter 8 Designing Security Scorecards Index
Nine Algorithms That Changed the Future: The Ingenious Ideas That Drive Today's Computers
John MacCormick - 2012
A simple web search picks out a handful of relevant needles from the world's biggest haystack: the billions of pages on the World Wide Web. Uploading a photo to Facebook transmits millions of pieces of information over numerous error-prone network links, yet somehow a perfect copy of the photo arrives intact. Without even knowing it, we use public-key cryptography to transmit secret information like credit card numbers; and we use digital signatures to verify the identity of the websites we visit. How do our computers perform these tasks with such ease? This is the first book to answer that question in language anyone can understand, revealing the extraordinary ideas that power our PCs, laptops, and smartphones. Using vivid examples, John MacCormick explains the fundamental "tricks" behind nine types of computer algorithms, including artificial intelligence (where we learn about the "nearest neighbor trick" and "twenty questions trick"), Google's famous PageRank algorithm (which uses the "random surfer trick"), data compression, error correction, and much more. These revolutionary algorithms have changed our world: this book unlocks their secrets, and lays bare the incredible ideas that our computers use every day.
Statistical Analysis with Excel for Dummies
Joseph Schmuller - 2005
mean, margin of error, standard deviation, permutations, and correlations-all using Excel
Algorithms to Live By: The Computer Science of Human Decisions
Brian Christian - 2016
What should we do, or leave undone, in a day or a lifetime? How much messiness should we accept? What balance of new activities and familiar favorites is the most fulfilling? These may seem like uniquely human quandaries, but they are not: computers, too, face the same constraints, so computer scientists have been grappling with their version of such issues for decades. And the solutions they've found have much to teach us.In a dazzlingly interdisciplinary work, acclaimed author Brian Christian and cognitive scientist Tom Griffiths show how the algorithms used by computers can also untangle very human questions. They explain how to have better hunches and when to leave things to chance, how to deal with overwhelming choices and how best to connect with others. From finding a spouse to finding a parking spot, from organizing one's inbox to understanding the workings of memory, Algorithms to Live By transforms the wisdom of computer science into strategies for human living.
Statistics: An Introduction Using R
Michael J. Crawley - 2005
R is one of the most powerful and flexible statistical software packages available, and enables the user to apply a wide variety of statistical methods ranging from simple regression to generalized linear modelling. Statistics: An Introduction using R is a clear and concise introductory textbook to statistical analysis using this powerful and free software, and follows on from the success of the author's previous best-selling title Statistical Computing. * Features step-by-step instructions that assume no mathematics, statistics or programming background, helping the non-statistician to fully understand the methodology. * Uses a series of realistic examples, developing step-wise from the simplest cases, with the emphasis on checking the assumptions (e.g. constancy of variance and normality of errors) and the adequacy of the model chosen to fit the data. * The emphasis throughout is on estimation of effect sizes and confidence intervals, rather than on hypothesis testing. * Covers the full range of statistical techniques likely to be need to analyse the data from research projects, including elementary material like t-tests and chi-squared tests, intermediate methods like regression and analysis of variance, and more advanced techniques like generalized linear modelling. * Includes numerous worked examples and exercises within each chapter. * Accompanied by a website featuring worked examples, data sets, exercises and solutions: http: //www.imperial.ac.uk/bio/research/crawl... Statistics: An Introduction using R is the first text to offer such a concise introduction to a broad array of statistical methods, at a level that is elementary enough to appeal to a broad range of disciplines. It is primarily aimed at undergraduate students in medicine, engineering, economics and biology - but will also appeal to postgraduates who have not previously covered this area, or wish to switch to using R.
Calling Bullshit: The Art of Skepticism in a Data-Driven World
Carl T. Bergstrom - 2020
Now, two science professors give us the tools to dismantle misinformation and think clearly in a world of fake news and bad data.It's increasingly difficult to know what's true. Misinformation, disinformation, and fake news abound. Our media environment has become hyperpartisan. Science is conducted by press release. Startup culture elevates bullshit to high art. We are fairly well equipped to spot the sort of old-school bullshit that is based in fancy rhetoric and weasel words, but most of us don't feel qualified to challenge the avalanche of new-school bullshit presented in the language of math, science, or statistics. In Calling Bullshit, Professors Carl Bergstrom and Jevin West give us a set of powerful tools to cut through the most intimidating data.You don't need a lot of technical expertise to call out problems with data. Are the numbers or results too good or too dramatic to be true? Is the claim comparing like with like? Is it confirming your personal bias? Drawing on a deep well of expertise in statistics and computational biology, Bergstrom and West exuberantly unpack examples of selection bias and muddled data visualization, distinguish between correlation and causation, and examine the susceptibility of science to modern bullshit.We have always needed people who call bullshit when necessary, whether within a circle of friends, a community of scholars, or the citizenry of a nation. Now that bullshit has evolved, we need to relearn the art of skepticism.
Data-ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else
Steve Lohr - 2015
Today, Data is the vital raw material of the information economy. The explosive abundance of this digital asset, more than doubling every two years, is creating a new world of opportunity and challenge.Data-ism is about this next phase, in which vast, Internet-scale data sets are used for discovery and prediction in virtually every field. It is a journey across this emerging world with people, illuminating narrative examples, and insights. It shows that, if exploited, this new revolution will change the way decisions are made—relying more on data and analysis, and less on intuition and experience—and transform the nature of leadership and management.Lohr explains how individuals and institutions will need to exploit, protect, and manage their data to stay competitive in the coming years. Filled with rich examples and anecdotes of the various ways in which the rise of Big Data is affecting everyday life it raises provocative questions about policy and practice that have wide implications for all of our lives.
Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement
Eric Redmond - 2012
As a modern application developer you need to understand the emerging field of data management, both RDBMS and NoSQL. Seven Databases in Seven Weeks takes you on a tour of some of the hottest open source databases today. In the tradition of Bruce A. Tate's Seven Languages in Seven Weeks, this book goes beyond your basic tutorial to explore the essential concepts at the core each technology. Redis, Neo4J, CouchDB, MongoDB, HBase, Riak and Postgres. With each database, you'll tackle a real-world data problem that highlights the concepts and features that make it shine. You'll explore the five data models employed by these databases-relational, key/value, columnar, document and graph-and which kinds of problems are best suited to each. You'll learn how MongoDB and CouchDB are strikingly different, and discover the Dynamo heritage at the heart of Riak. Make your applications faster with Redis and more connected with Neo4J. Use MapReduce to solve Big Data problems. Build clusters of servers using scalable services like Amazon's Elastic Compute Cloud (EC2). Discover the CAP theorem and its implications for your distributed data. Understand the tradeoffs between consistency and availability, and when you can use them to your advantage. Use multiple databases in concert to create a platform that's more than the sum of its parts, or find one that meets all your needs at once.Seven Databases in Seven Weeks will take you on a deep dive into each of the databases, their strengths and weaknesses, and how to choose the ones that fit your needs.What You Need: To get the most of of this book you'll have to follow along, and that means you'll need a *nix shell (Mac OSX or Linux preferred, Windows users will need Cygwin), and Java 6 (or greater) and Ruby 1.8.7 (or greater). Each chapter will list the downloads required for that database.
Data Science at the Command Line: Facing the Future with Time-Tested Tools
Jeroen Janssens - 2014
You'll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.To get you started--whether you're on Windows, OS X, or Linux--author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools.Discover why the command line is an agile, scalable, and extensible technology. Even if you're already comfortable processing data with, say, Python or R, you'll greatly improve your data science workflow by also leveraging the power of the command line.Obtain data from websites, APIs, databases, and spreadsheetsPerform scrub operations on plain text, CSV, HTML/XML, and JSONExplore data, compute descriptive statistics, and create visualizationsManage your data science workflow using DrakeCreate reusable tools from one-liners and existing Python or R codeParallelize and distribute data-intensive pipelines using GNU ParallelModel data with dimensionality reduction, clustering, regression, and classification algorithms
The Future Computed: Artificial Intelligence and its Role in Society
Microsoft Corporation - 2018
It’s already happening in impressive ways. But as we’ve witnessed over the past 20 years, new technology also inevitably raises complex questions and broad societal concerns.” – Brad Smith and Harry Shum on The Future Computed. “As we look to a future powered by a partnership between computers and humans, it’s important that we address these challenges head on. How do we ensure that AI is designed and used responsibly? How do we establish ethical principles to protect people? How should we govern its use? And how will AI impact employment and jobs?” – Brad Smith and Harry Shum on The Future Computed. As Artificial Intelligence shows up in every aspect of our lives, Microsoft's top minds provide a guide discussing how we should prepare for the future. Whether you're a government leader crafting new laws, an entrepreneur looking to incorporate AI into your business, or a parent contemplating the future of education, this book explains the trends driving the AI revolution, identifies the complex ethics and workforce issues we all need to think about and suggests a path forward. Read more: The Future Computed: Artificial Intelligence and its role in society provides Microsoft’s perspective on where AI technology is going and the new societal issues it is raising – ensuring AI is designed and used responsibly, establishing ethical principles to protect people, and how AI will impact employment and jobs. The principles of fairness, reliability and safety, privacy and security, inclusiveness, transparency and accountability are critical to addressing the societal impacts of AI and building trust as AI becomes more and more a part of the products and services that people use at work and at home every day. A central theme in The Future Computed is that for AI to deliver on its potential drive widespread economic and social progress, the technology needs to be human-centered – combining the capabilities of computers with human capabilities to enable people to achieve more. But a human-centered approach can only be realized if researchers, policymakers, and leaders from government, business and civil society come together to develop a shared ethical framework for AI. This in turn will help foster responsible development of AI systems that will engender trust. Because in an increasingly AI-driven world the question is not what computers can do, it is what computers should do. The Future Computed also draws a few conclusions as we chart our path forward. First, the companies and countries that will fare best in the AI era will be those that embrace these changes rapidly and effectively. Second, while AI will help solve big societal problems, we must look to this future with a critical eye as there will be challenges as well as opportunities. Third, we need to act with a sense of shared responsibility because AI won’t be created by the tech sector alone. Finally, skilling-up for an AI-powered world involves more than science, technology, engineering and math. As computers behave more like humans, the social sciences and humanities will become grow in importance.
The Deep Learning Revolution
Terrence J. Sejnowski - 2018
Deep learning networks can play poker better than professional poker players and defeat a world champion at Go. In this book, Terry Sejnowski explains how deep learning went from being an arcane academic field to a disruptive technology in the information economy.Sejnowski played an important role in the founding of deep learning, as one of a small group of researchers in the 1980s who challenged the prevailing logic-and-symbol based version of AI. The new version of AI Sejnowski and others developed, which became deep learning, is fueled instead by data. Deep networks learn from data in the same way that babies experience the world, starting with fresh eyes and gradually acquiring the skills needed to navigate novel environments. Learning algorithms extract information from raw data; information can be used to create knowledge; knowledge underlies understanding; understanding leads to wisdom. Someday a driverless car will know the road better than you do and drive with more skill; a deep learning network will diagnose your illness; a personal cognitive assistant will augment your puny human brain. It took nature many millions of years to evolve human intelligence; AI is on a trajectory measured in decades. Sejnowski prepares us for a deep learning future.
Bayesian Statistics the Fun Way: Understanding Statistics and Probability with Star Wars, Lego, and Rubber Ducks
Will Kurt - 2019
But many people use data in ways they don't even understand, meaning they aren't getting the most from it. Bayesian Statistics the Fun Way will change that.This book will give you a complete understanding of Bayesian statistics through simple explanations and un-boring examples. Find out the probability of UFOs landing in your garden, how likely Han Solo is to survive a flight through an asteroid shower, how to win an argument about conspiracy theories, and whether a burglary really was a burglary, to name a few examples.By using these off-the-beaten-track examples, the author actually makes learning statistics fun. And you'll learn real skills, like how to:- How to measure your own level of uncertainty in a conclusion or belief- Calculate Bayes theorem and understand what it's useful for- Find the posterior, likelihood, and prior to check the accuracy of your conclusions- Calculate distributions to see the range of your data- Compare hypotheses and draw reliable conclusions from themNext time you find yourself with a sheaf of survey results and no idea what to do with them, turn to Bayesian Statistics the Fun Way to get the most value from your data.