Book picks similar to
Learning Apache Spark 2 by Muhammad Asif Abbasi
big-data
computer-science
computer-science-spark
parallel-computing
The Art of Data Science: A Guide for Anyone Who Works with Data
Roger D. Peng - 2015
The authors have extensive experience both managing data analysts and conducting their own data analyses, and have carefully observed what produces coherent results and what fails to produce useful insights into data. This book is a distillation of their experience in a format that is applicable to both practitioners and managers in data science.
Taming Text: How to Find, Organize, and Manipulate It
Grant S. Ingersoll - 2011
This causes real problems for everyday users who need to make sense of all the information available, and for software engineers who want to make their text-based applications more useful and user-friendly. Whether building a search engine for a corporate website, automatically organizing email, or extracting important nuggets of information from the news, dealing with unstructured text can be daunting.Taming Text is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. It explores how to automatically organize text, using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. This book gives examples illustrating each of these topics, as well as the foundations upon which they are built.Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.
Good Charts: The HBR Guide to Making Smarter, More Persuasive Data Visualizations
Scott Berinato - 2016
No longer. A new generation of tools and massive amounts of available data make it easy for anyone to create visualizations that communicate ideas far more effectively than generic spreadsheet charts ever could.What’s more, building good charts is quickly becoming a need-to-have skill for managers. If you’re not doing it, other managers are, and they’re getting noticed for it and getting credit for contributing to your company’s success.In Good Charts, dataviz maven Scott Berinato provides an essential guide to how visualization works and how to use this new language to impress and persuade. Dataviz today is where spreadsheets and word processors were in the early 1980s—on the cusp of changing how we work. Berinato lays out a system for thinking visually and building better charts through a process of talking, sketching, and prototyping.This book is much more than a set of static rules for making visualizations. It taps into both well-established and cutting-edge research in visual perception and neuroscience, as well as the emerging field of visualization science, to explore why good charts (and bad ones) create “feelings behind our eyes.” Along the way, Berinato also includes many engaging vignettes of dataviz pros, illustrating the ideas in practice.Good Charts will help you turn plain, uninspiring charts that merely present information into smart, effective visualizations that powerfully convey ideas.
Numsense! Data Science for the Layman: No Math Added
Annalyn Ng - 2017
Sold in over 85 countries and translated into more than 5 languages.---------------Want to get started on data science?Our promise: no math added.This book has been written in layman's terms as a gentle introduction to data science and its algorithms. Each algorithm has its own dedicated chapter that explains how it works, and shows an example of a real-world application. To help you grasp key concepts, we stick to intuitive explanations and visuals.Popular concepts covered include:- A/B Testing- Anomaly Detection- Association Rules- Clustering- Decision Trees and Random Forests- Regression Analysis- Social Network Analysis- Neural NetworksFeatures:- Intuitive explanations and visuals- Real-world applications to illustrate each algorithm- Point summaries at the end of each chapter- Reference sheets comparing the pros and cons of algorithms- Glossary list of commonly-used termsWith this book, we hope to give you a practical understanding of data science, so that you, too, can leverage its strengths in making better decisions.
Big Data for Dummies
Judith Hurwitz - 2013
Data sets such as customer transactions for a mega-retailer, weather patterns monitored by meteorologists, or social network activity can quickly outpace the capacity of traditional data management tools. If you need to develop or manage big data solutions, you'll appreciate how these four experts define, explain, and guide you through this new and often confusing concept. You'll learn what it is, why it matters, and how to choose and implement solutions that work.Effectively managing big data is an issue of growing importance to businesses, not-for-profit organizations, government, and IT professionals Authors are experts in information management, big data, and a variety of solutions Explains big data in detail and discusses how to select and implement a solution, security concerns to consider, data storage and presentation issues, analytics, and much more Provides essential information in a no-nonsense, easy-to-understand style that is empowering Big Data For Dummies cuts through the confusion and helps you take charge of big data solutions for your organization.
Race Against The Machine
Erik Brynjolfsson - 2011
Drawing on research by their team at the Center for Digital Business, they show that there's been no stagnation in technology -- in fact, the digital revolution is accelerating. Recent advances are the stuff of science fiction: computers now drive cars in traffic, translate between human languages effectively, and beat the best human Jeopardy! players.As these examples show, digital technologies are rapidly encroaching on skills that used to belong to humans alone. This phenomenon is both broad and deep, and has profound economic implications. Many of these implications are positive; digital innovation increases productivity, reduces prices (sometimes to zero), and grows the overall economic pie.But digital innovation has also changed how the economic pie is distributed, and here the news is not good for the median worker. As technology races ahead, it can leave many people behind. Workers whose skills have been mastered by computers have less to offer the job market, and see their wages and prospects shrink. Entrepreneurial business models, new organizational structures and different institutions are needed to ensure that the average worker is not left behind by cutting-edge machines.In Race Against the Machine Brynjolfsson and McAfee bring together a range of statistics, examples, and arguments to show that technological progress is accelerating, and that this trend has deep consequences for skills, wages, and jobs. The book makes the case that employment prospects are grim for many today not because there's been technology has stagnated, but instead because we humans and our organizations aren't keeping up.
How to Write the Perfect Resume: Stand Out, Land Interviews, and Get the Job You Want
Dan Clay - 2018
As you read through the job description, your excitement builds as you realize that the job is a perfect fit! Not wasting another second, you fill out the application, attach your resume, and hold your breath as you hit “Apply.” Then you wait. And wait. And wait some more. Weeks go by without hearing so much as a peep, and before long you’ve given up hope on what seemed like a match made in heaven. Sound familiar? You’re not alone! On average there are 250 resumes submitted for every job opening, which means that 99.6% of applicants will fail to land the jobs they apply for. To get the job you want, you don’t just need a great resume--you need an outstanding resume, one that puts you in the top 1% of candidates for the job. That means ditching the same old advice you’ve been following with little results and adopting a tried-and-true process for getting your resume noticed in even the most competitive situations. In this book, Dan Clay breaks down the exact method he’s carefully developed over a period of ten years and provides a precise, step-by-step set of instructions for crafting the perfect resume, down to the last period. Unlike the dime-a-dozen recruiters turned career coaches who have never had to put themselves on the line in today’s brutally competitive job market, Dan offers practical, real-world experience gained from applying for and getting job offers from some of the most prestigious, competitive companies in the world. And when it comes to something as important as your career, don’t you deserve to learn from someone who’s actually succeeded at doing what you’re hoping to do? Of course you do! Here are some of the things you’ll learn about how to transform your resume from average to awe-inspiring: How to handle tricky pitfalls like extended time off or unemployment and have your resume come out as strong as ever How to make your accomplishments sound dramatically more impressive without having to tell a single lie How to remove the guesswork about what to include in your resume and build it to the exacting specifications of your target job's requirements How to pass the four tests that companies will put your resume through with flying colors How to strike the perfect composition of content, white space, and page length to accentuate and differentiate your strengths How to avoid the common (and not so common) resume mistakes that leave your resume dead on arrival How to tell a powerful story that demonstrates your capabilities in a way that will knock the socks off anyone reading it How to stand out without resorting to cheap tricks that come off as cheesy or over-the-top PLUS, you’ll also gain access to a free companion website containing fully editable resume templates, a perfect resume checklist, and other bonus materials to give you everything you need to create a stunning resume that will get you noticed and land you interviews. Whether you’re a new graduate looking for your first job, a career veteran angling for your next move, a recent victim of a layoff, or someone looking to dip their toes back int
An Introduction to Statistical Learning: With Applications in R
Gareth James - 2013
This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree- based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.
Storytelling with Data: A Data Visualization Guide for Business Professionals
Cole Nussbaumer Knaflic - 2015
You'll discover the power of storytelling and the way to make data a pivotal point in your story. The lessons in this illuminative text are grounded in theory, but made accessible through numerous real-world examples--ready for immediate application to your next graph or presentation.Storytelling is not an inherent skill, especially when it comes to data visualization, and the tools at our disposal don't make it any easier. This book demonstrates how to go beyond conventional tools to reach the root of your data, and how to use your data to create an engaging, informative, compelling story. Specifically, you'll learn how to:Understand the importance of context and audience Determine the appropriate type of graph for your situation Recognize and eliminate the clutter clouding your information Direct your audience's attention to the most important parts of your data Think like a designer and utilize concepts of design in data visualization Leverage the power of storytelling to help your message resonate with your audience Together, the lessons in this book will help you turn your data into high impact visual stories that stick with your audience. Rid your world of ineffective graphs, one exploding 3D pie chart at a time. There is a story in your data--Storytelling with Data will give you the skills and power to tell it!
Introductory Statistics with R
Peter Dalgaard - 2002
It can be freely downloaded and it works on multiple computer platforms. This book provides an elementary introduction to R. In each chapter, brief introductory sections are followed by code examples and comments from the computational and statistical viewpoint. A supplementary R package containing the datasets can be downloaded from the web.
MongoDB: The Definitive Guide
Kristina Chodorow - 2010
Learn how easy it is to handle data as self-contained JSON-style documents, rather than as records in a relational database.Explore ways that document-oriented storage will work for your projectLearn how MongoDB’s schema-free data model handles documents, collections, and multiple databasesExecute basic write operations, and create complex queries to find data with any criteriaUse indexes, aggregation tools, and other advanced query techniquesLearn about monitoring, security and authentication, backup and repair, and moreSet up master-slave and automatic failover replication in MongoDBUse sharding to scale MongoDB horizontally, and learn how it impacts applicationsGet example applications written in Java, PHP, Python, and Ruby
Streaming Systems
Tyler Akidau - 2018
As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to work with streaming data in a conceptual and platform-agnostic way.Expanded from Tyler Akidau's popular blog posts Streaming 101 and Streaming 102, this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams. You'll also dive deep into watermarks and exactly-once processing with co-authors Slava Chernyak and Reuven Lax.You'll explore:How streaming and batch data processing patterns compareThe core principles and concepts behind robust out-of-order data processingHow watermarks track progress and completeness in infinite datasetsHow exactly-once data processing techniques ensure correctnessHow the concepts of streams and tables form the foundations of both batch and streaming data processingThe practical motivations behind a powerful persistent state mechanism, driven by a real-world exampleHow time-varying relations provide a link between stream processing and the world of SQL and relational algebra
The Black Box Society: The Secret Algorithms That Control Money and Information
Frank Pasquale - 2014
The data compiled and portraits created are incredibly detailed, to the point of being invasive. But who connects the dots about what firms are doing with this information? The Black Box Society argues that we all need to be able to do so--and to set limits on how big data affects our lives.Hidden algorithms can make (or ruin) reputations, decide the destiny of entrepreneurs, or even devastate an entire economy. Shrouded in secrecy and complexity, decisions at major Silicon Valley and Wall Street firms were long assumed to be neutral and technical. But leaks, whistleblowers, and legal disputes have shed new light on automated judgment. Self-serving and reckless behavior is surprisingly common, and easy to hide in code protected by legal and real secrecy. Even after billions of dollars of fines have been levied, underfunded regulators may have only scratched the surface of this troubling behavior.Frank Pasquale exposes how powerful interests abuse secrecy for profit and explains ways to rein them in. Demanding transparency is only the first step. An intelligible society would assure that key decisions of its most important firms are fair, nondiscriminatory, and open to criticism. Silicon Valley and Wall Street need to accept as much accountability as they impose on others.
Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites
Matthew A. Russell - 2011
You’ll learn how to combine social web data, analysis techniques, and visualization to find what you’ve been looking for in the social haystack—as well as useful information you didn’t know existed.Each standalone chapter introduces techniques for mining data in different areas of the social Web, including blogs and email. All you need to get started is a programming background and a willingness to learn basic Python tools.Get a straightforward synopsis of the social web landscapeUse adaptable scripts on GitHub to harvest data from social network APIs such as Twitter, Facebook, LinkedIn, and Google+Learn how to employ easy-to-use Python tools to slice and dice the data you collectExplore social connections in microformats with the XHTML Friends NetworkApply advanced mining techniques such as TF-IDF, cosine similarity, collocation analysis, document summarization, and clique detectionBuild interactive visualizations with web technologies based upon HTML5 and JavaScript toolkits"A rich, compact, useful, practical introduction to a galaxy of tools, techniques, and theories for exploring structured and unstructured data." --Alex Martelli, Senior Staff Engineer, Google
Social and Economic Networks
Matthew O. Jackson - 2008
The many aspects of our lives that are governed by social networks make it critical to understand how they impact behavior, which network structures are likely to emerge in a society, and why we organize ourselves as we do. In Social and Economic Networks, Matthew Jackson offers a comprehensive introduction to social and economic networks, drawing on the latest findings in economics, sociology, computer science, physics, and mathematics. He provides empirical background on networks and the regularities that they exhibit, and discusses random graph-based models and strategic models of network formation. He helps readers to understand behavior in networked societies, with a detailed analysis of learning and diffusion in networks, decision making by individuals who are influenced by their social neighbors, game theory and markets on networks, and a host of related subjects. Jackson also describes the varied statistical and modeling techniques used to analyze social networks. Each chapter includes exercises to aid students in their analysis of how networks function.This book is an indispensable resource for students and researchers in economics, mathematics, physics, sociology, and business.