Book picks similar to
Learning Apache Spark 2 by Muhammad Asif Abbasi


computer-science
computer-science-spark
parallel-computing
scala

Building Machine Learning Systems with Python


Willi Richert - 2013
    

D is for Digital: What a well-informed person should know about computers and communications


Brian W. Kernighan - 2011
    

Mining of Massive Datasets


Anand Rajaraman - 2011
    This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.

Data Smart: Using Data Science to Transform Information into Insight


John W. Foreman - 2013
    Major retailers are predicting everything from when their customers are pregnant to when they want a new pair of Chuck Taylors. It's a brave new world where seemingly meaningless data can be transformed into valuable insight to drive smart business decisions.But how does one exactly do data science? Do you have to hire one of these priests of the dark arts, the "data scientist," to extract this gold from your data? Nope.Data science is little more than using straight-forward steps to process raw data into actionable insight. And in Data Smart, author and data scientist John Foreman will show you how that's done within the familiar environment of a spreadsheet. Why a spreadsheet? It's comfortable! You get to look at the data every step of the way, building confidence as you learn the tricks of the trade. Plus, spreadsheets are a vendor-neutral place to learn data science without the hype. But don't let the Excel sheets fool you. This is a book for those serious about learning the analytic techniques, the math and the magic, behind big data.Each chapter will cover a different technique in a spreadsheet so you can follow along: - Mathematical optimization, including non-linear programming and genetic algorithms- Clustering via k-means, spherical k-means, and graph modularity- Data mining in graphs, such as outlier detection- Supervised AI through logistic regression, ensemble models, and bag-of-words models- Forecasting, seasonal adjustments, and prediction intervals through monte carlo simulation- Moving from spreadsheets into the R programming languageYou get your hands dirty as you work alongside John through each technique. But never fear, the topics are readily applicable and the author laces humor throughout. You'll even learn what a dead squirrel has to do with optimization modeling, which you no doubt are dying to know.

Graph Databases


Ian Robinson - 2013
    With this practical book, you’ll learn how to design and implement a graph database that brings the power of graphs to bear on a broad range of problem domains. Whether you want to speed up your response to user queries or build a database that can adapt as your business evolves, this book shows you how to apply the schema-free graph model to real-world problems.Learn how different organizations are using graph databases to outperform their competitors. With this book’s data modeling, query, and code examples, you’ll quickly be able to implement your own solution.Model data with the Cypher query language and property graph modelLearn best practices and common pitfalls when modeling with graphsPlan and implement a graph database solution in test-driven fashionExplore real-world examples to learn how and why organizations use a graph databaseUnderstand common patterns and components of graph database architectureUse analytical techniques and algorithms to mine graph database information

Data Structures Using C++


D.S. Malik - 2003
    D.S. Malik is ideal for a one-semester course focused on data structures. Clearly written with the student in mind, this text focuses on Data Structures and includes advanced topics in C++ such as Linked Lists and the Standard Template Library (STL). This student-friendly text features abundant Programming Examples and extensive use of visual diagrams to reinforce difficult topics. Students will find Dr. Malik's use of complete programming code and clear display of syntax, explanation, and example easy to read and conducive to learning.

Programming Collective Intelligence: Building Smart Web 2.0 Applications


Toby Segaran - 2002
    With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:Collaborative filtering techniques that enable online retailers to recommend products or media Methods of clustering to detect groups of similar items in a large dataset Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm Optimization algorithms that search millions of possible solutions to a problem and choose the best one Bayesian filtering, used in spam filters for classifying documents based on word types and other features Using decision trees not only to make predictions, but to model the way decisions are made Predicting numerical values rather than classifications to build price models Support vector machines to match people in online dating sites Non-negative matrix factorization to find the independent features in a dataset Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a game Each chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."-- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths."-- Tim Wolters, CTO, Collective Intellect

Database Management Systems


Raghu Ramakrishnan - 1997
    Coherent explanations and practical examples have made this one of the leading texts in the field. The third edition continues in this tradition, enhancing it with more practical material. The new edition has been reorganized to allow more flexibility in the way the course is taught. Now, instructors can easily choose whether they would like to teach a course which emphasizes database application development or a course that emphasizes database systems issues. New overview chapters at the beginning of parts make it possible to skip other chapters in the part if you don't want the detail.More applications and examples have been added throughout the book, including SQL and Oracle examples. The applied flavor is further enhanced by the two new database applications chapters.

Introduction to Computer Theory


Daniel I.A. Cohen - 1986
    Covers all the topics needed by computer scientists with a sometimes humorous approach that reviewers found refreshing. The goal of the book is to provide a firm understanding of the principles and the big picture of where computer theory fits into the field.

Data Science from Scratch: First Principles with Python


Joel Grus - 2015
    In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out. Get a crash course in Python Learn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data science Collect, explore, clean, munge, and manipulate data Dive into the fundamentals of machine learning Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering Explore recommender systems, natural language processing, network analysis, MapReduce, and databases

Metadata


Jeffrey Pomerantz - 2015
    When "metadata" became breaking news, appearing in stories about surveillance by the National Security Agency, many members of the public encountered this once-obscure term from information science for the first time. Should people be reassured that the NSA was "only" collecting metadata about phone calls--information about the caller, the recipient, the time, the duration, the location--and not recordings of the conversations themselves? Or does phone call metadata reveal more than it seems? In this book, Jeffrey Pomerantz offers an accessible and concise introduction to metadata.In the era of ubiquitous computing, metadata has become infrastructural, like the electrical grid or the highway system. We interact with it or generate it every day. It is not, Pomerantz tell us, just "data about data." It is a means by which the complexity of an object is represented in a simpler form. For example, the title, the author, and the cover art are metadata about a book. When metadata does its job well, it fades into the background; everyone (except perhaps the NSA) takes it for granted.Pomerantz explains what metadata is, and why it exists. He distinguishes among different types of metadata--descriptive, administrative, structural, preservation, and use--and examines different users and uses of each type. He discusses the technologies that make modern metadata possible, and he speculates about metadata's future. By the end of the book, readers will see metadata everywhere. Because, Pomerantz warns us, it's metadata's world, and we are just living in it.

Prometheus: Up & Running: Infrastructure and Application Performance Monitoring


Brian Brazil - 2018
    This practical guide provides application developers, sysadmins, and DevOps practitioners with a hands-on introduction to the most important aspects of Prometheus, including dashboarding and alerting, direct code instrumentation, and metric collection from third-party systems with exporters.This open source system has gained popularity over the past few years for good reason. With its simple yet powerful data model and query language, Prometheus does one thing, and it does it well. Author and Prometheus developer Brian Brazil guides you through Prometheus setup, the Node exporter, and the Alertmanager, then demonstrates how to use them for application and infrastructure monitoring.Know where and how much to apply instrumentation to your application codeIdentify metrics with labels using unique key-value pairsGet an introduction to Grafana, a popular tool for building dashboardsLearn how to use the Node Exporter to monitor your infrastructureUse service discovery to provide different views of your machines and servicesUse Prometheus with Kubernetes and examine exporters you can use with containersConvert data from other monitoring systems into the Prometheus format

R for Dummies


Joris Meys - 2012
    R is packed with powerful programming capabilities, but learning to use R in the real world can be overwhelming for even the most seasoned statisticians. This easy-to-follow guide explains how to use R for data processing and statistical analysis, and then, shows you how to present your data using compelling and informative graphics. You'll gain practical experience using R in a variety of settings and delve deeper into R's feature-rich toolset.Includes tips for the initial installation of RDemonstrates how to easily perform calculations on vectors, arrays, and lists of dataShows how to effectively visualize data using R's powerful graphics packagesGives pointers on how to find, install, and use add-on packages created by the R communityProvides tips on getting additional help from R mailing lists and websitesWhether you're just starting out with statistical analysis or are a procedural programming pro, "R For Dummies" is the book you need to get the most out of R.

SEO 2016: Learn Search Engine Optimization (SEO Books Series)


R.L. Adams - 2015
    It's certainly no walk in the park. And, depending on where you've been for your information when it comes to SEO, it might be outdated, or just flat-out wrong. Why is that? Search has been evolving at an uncanny rate in recent years. And, if you're not in the know, then you could end up spinning your wheels and wasting valuable and precious time and resources on techniques that no longer work. The main reason for the recent changes: to increase relevancy. Google's sole mission is to provide the most relevant search results at the top of its searches, in the quickest manner possible. But, in recent years, due to some mischievous behavior at the hand of a small group of people, relevancy began to wane. SEO 2016 :: Understanding Google's Algorithm Adjustments The field of SEO has been changing, all led by Google's onslaught of algorithm adjustments that have decimated and razed some sites while uplifting and building others. Since 2011, Google has made it its mission to hunt out and demote spammy sites that sacrifice user-experience, focus on thin content, or simply spend their time trying to trick and deceive their way to the top of its search results. At the same time, Google has increased its reliance on four major components of trust, that work at the heart of its search algorithm: Trust in Age Trust in Authority Trust in Content Relevancy In this book, you'll learn just how each of these affects Google's search results, and just how you can best optimize your site and content to ensure that you're playing by Google's many rules. And, although there have been many algorithm adjustments over the years, four major ones have shaped and forever changed the search engine landscape: Google Panda Google Penguin Google Hummingbird Google Mobilegeddon We'll discuss the nature of these changes and just how each of these algorithm adjustments have shaped the current landscape in search engine optimization. So what does it take to rank your site today? In order to compete at any level in SEO, you have to earn trust - Google's trust that is. But, what does that take? How can we build trust quickly without jumping through all the hoops? SEO is by no means a small feat. It takes hard work applied consistently overtime. There are no overnight success stories when it comes to SEO. But there are certainly ways to navigate the stormy online waters of Google's highly competitive search. Download SEO 2016 :: Learn Search Engine Optimization Lift the veil on Google's complex search algorithm, and understand just what it takes to rank on Google searches today, not yesterday.

Big data @ work : dispelling the myths, uncovering the opportunities


Thomas H. Davenport - 2014
    The author was—at first.When the term “big data” first came on the scene, bestselling author Tom Davenport (Competing on Analytics, Analytics at Work) thought it was just another example of technology hype. But his research in the years that followed changed his mind.Now, in clear, conversational language, Davenport explains what big data means—and why everyone in business needs to know about it. Big Data at Work covers all the bases: what big data means from a technical, consumer, and management perspective; what its opportunities and costs are; where it can have real business impact; and which aspects of this hot topic have been oversold.This book will help you understand:• Why big data is important to you and your organization• What technology you need to manage it• How big data could change your job, your company, and your industry• How to hire, rent, or develop the kinds of people who make big data work• The key success factors in implementing any big data project• How big data is leading to a new approach to managing analyticsWith dozens of company examples, including UPS, GE, Amazon, United Healthcare, Citigroup, and many others, this book will help you seize all opportunities—from improving decisions, products, and services to strengthening customer relationships. It will show you how to put big data to work in your own organization so that you too can harness the power of this ever-evolving new resource.