R in a Nutshell: A Desktop Quick Reference


Joseph Adler - 2009
    R in a Nutshell provides a quick and practical way to learn this increasingly popular open source language and environment. You'll not only learn how to program in R, but also how to find the right user-contributed R packages for statistical modeling, visualization, and bioinformatics.The author introduces you to the R environment, including the R graphical user interface and console, and takes you through the fundamentals of the object-oriented R language. Then, through a variety of practical examples from medicine, business, and sports, you'll learn how you can use this remarkable tool to solve your own data analysis problems.Understand the basics of the language, including the nature of R objectsLearn how to write R functions and build your own packagesWork with data through visualization, statistical analysis, and other methodsExplore the wealth of packages contributed by the R communityBecome familiar with the lattice graphics package for high-level data visualizationLearn about bioinformatics packages provided by Bioconductor"I am excited about this book. R in a Nutshell is a great introduction to R, as well as a comprehensive reference for using R in data analytics and visualization. Adler provides 'real world' examples, practical advice, and scripts, making it accessible to anyone working with data, not just professional statisticians."

Machine Learning for Hackers


Drew Conway - 2012
    Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation.Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, you'll learn how to analyze sample datasets and write simple machine learning algorithms. "Machine Learning for Hackers" is ideal for programmers from any background, including business, government, and academic research.Develop a naive Bayesian classifier to determine if an email is spam, based only on its textUse linear regression to predict the number of page views for the top 1,000 websitesLearn optimization techniques by attempting to break a simple letter cipherCompare and contrast U.S. Senators statistically, based on their voting recordsBuild a "whom to follow" recommendation system from Twitter data

Large-Scale C++ Software Design


John S. Lakos - 1996
    It is the first C++ book that actually demonstrates how to design large systems, and one of the few books on object-oriented design specifically geared to practical aspects of the C++ programming language. In this book, Lakos explains the process of decomposing large systems into physical (not inheritance) hierarchies of smaller, more manageable components. Such systems with their acyclic physical dependencies are fundamentally easier and more economical to maintain, test, and reuse than tightly interdependent systems. In addition to explaining the motivation for following good physical as well as logical design practices, Lakos provides you with a catalog of specific techniques designed to eliminate cyclic, compile-time, and link-time (physical) dependencies. He then extends these concepts from large to very large systems. The book concludes with a comprehensive top-down approach to the logical design of individual components. Appendices include a valuable design pattern Protocol Hierarchy designed to avoid fat inte

Data Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems)


Jiawei Han - 2000
    Not only are all of our business, scientific, and government transactions now computerized, but the widespread use of digital cameras, publication tools, and bar codes also generate data. On the collection side, scanned text and image platforms, satellite remote sensing systems, and the World Wide Web have flooded us with a tremendous amount of data. This explosive growth has generated an even more urgent need for new techniques and automated tools that can help us transform this data into useful information and knowledge.Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and applications. This new edition substantially enhances the first edition, and new chapters have been added to address recent developments on mining complex types of data- including stream data, sequence data, graph structured data, social network data, and multi-relational data.A comprehensive, practical look at the concepts and techniques you need to know to get the most out of real business dataUpdates that incorporate input from readers, changes in the field, and more material on statistics and machine learningDozens of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projectsComplete classroom support for instructors at www.mkp.com/datamining2e companion site

Advanced Analytics with Spark


Sandy Ryza - 2015
    

Machine Learning for Dummies


John Paul Mueller - 2016
    Without machine learning, fraud detection, web search results, real-time ads on web pages, credit scoring, automation, and email spam filtering wouldn't be possible, and this is only showcasing just a few of its capabilities. Written by two data science experts, Machine Learning For Dummies offers a much-needed entry point for anyone looking to use machine learning to accomplish practical tasks.Covering the entry-level topics needed to get you familiar with the basic concepts of machine learning, this guide quickly helps you make sense of the programming languages and tools you need to turn machine learning-based tasks into a reality. Whether you're maddened by the math behind machine learning, apprehensive about AI, perplexed by preprocessing data--or anything in between--this guide makes it easier to understand and implement machine learning seamlessly.Grasp how day-to-day activities are powered by machine learning Learn to 'speak' certain languages, such as Python and R, to teach machines to perform pattern-oriented tasks and data analysis Learn to code in R using R Studio Find out how to code in Python using Anaconda Dive into this complete beginner's guide so you are armed with all you need to know about machine learning!

Foundations of Software Testing: ISTQB Certification


Dorothy Graham - 2006
    The coverage also features learning aids.

The Principles of Object-Oriented JavaScript


Nicholas C. Zakas - 2012
    It has no concept of classes, and you don't even need to define any objects in order to write code. But don't be fooled—JavaScript is an incredibly powerful and expressive object-oriented language that puts many design decisions right into your hands.In The Principles of Object-Oriented JavaScript, Nicholas C. Zakas thoroughly explores JavaScript's object-oriented nature, revealing the language's unique implementation of inheritance and other key characteristics. You'll learn: The difference between primitive and reference values What makes JavaScript functions so unique The various ways to create objects How to define your own constructors How to work with and understand prototypes Inheritance patterns for types and objects The Principles of Object-Oriented JavaScript will leave even experienced developers with a deeper understanding of JavaScript. Unlock the secrets behind how objects work in JavaScript so you can write clearer, more flexible, and more efficient code.

Bad Data Handbook: Cleaning Up The Data So You Can Get Back To Work


Q. Ethan McCallum - 2012
    In this handbook, data expert Q. Ethan McCallum has gathered 19 colleagues from every corner of the data arena to reveal how they’ve recovered from nasty data problems.From cranky storage to poor representation to misguided policy, there are many paths to bad data. Bottom line? Bad data is data that gets in the way. This book explains effective ways to get around it.Among the many topics covered, you’ll discover how to:Test drive your data to see if it’s ready for analysisWork spreadsheet data into a usable formHandle encoding problems that lurk in text dataDevelop a successful web-scraping effortUse NLP tools to reveal the real sentiment of online reviewsAddress cloud computing issues that can impact your analysis effortAvoid policies that create data analysis roadblocksTake a systematic approach to data quality analysis

Graph Databases


Ian Robinson - 2013
    With this practical book, you’ll learn how to design and implement a graph database that brings the power of graphs to bear on a broad range of problem domains. Whether you want to speed up your response to user queries or build a database that can adapt as your business evolves, this book shows you how to apply the schema-free graph model to real-world problems.Learn how different organizations are using graph databases to outperform their competitors. With this book’s data modeling, query, and code examples, you’ll quickly be able to implement your own solution.Model data with the Cypher query language and property graph modelLearn best practices and common pitfalls when modeling with graphsPlan and implement a graph database solution in test-driven fashionExplore real-world examples to learn how and why organizations use a graph databaseUnderstand common patterns and components of graph database architectureUse analytical techniques and algorithms to mine graph database information

Applying Domain-Driven Design and Patterns : With Examples in C# and .NET


Jimmy Nilsson - 2006
    While the examples in this guide are in C# and .NET, the principles can be used by developers using any language and IDE.

Hadoop: The Definitive Guide


Tom White - 2009
    Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!

Discrete Mathematics and Its Applications


Kenneth H. Rosen - 2000
    These themes include mathematical reasoning, combinatorial analysis, discrete structures, algorithmic thinking, and enhanced problem-solving skills through modeling. Its intent is to demonstrate the relevance and practicality of discrete mathematics to all students. The Fifth Edition includes a more thorough and linear presentation of logic, proof types and proof writing, and mathematical reasoning. This enhanced coverage will provide students with a solid understanding of the material as it relates to their immediate field of study and other relevant subjects. The inclusion of applications and examples to key topics has been significantly addressed to add clarity to every subject. True to the Fourth Edition, the text-specific web site supplements the subject matter in meaningful ways, offering additional material for students and instructors. Discrete math is an active subject with new discoveries made every year. The continual growth and updates to the web site reflect the active nature of the topics being discussed. The book is appropriate for a one- or two-term introductory discrete mathematics course to be taken by students in a wide variety of majors, including computer science, mathematics, and engineering. College Algebra is the only explicit prerequisite.

The Data Warehouse Lifecycle Toolkit: Practical Techniques for Building Data Warehouse and Business Intelligence Systems


Ralph Kimball - 1998
    In that time, the data warehouse industry has reached full maturity and acceptance, hardware and software have made staggering advances, and the techniques promoted in the premiere edition of this book have been adopted by nearly all data warehouse vendors and practitioners. In addition, the term business intelligence emerged to reflect the mission of the data warehouse: wrangling the data out of source systems, cleaning it, and delivering it to add value to the business.Ralph Kimball and his colleagues have refined the original set of Lifecycle methods and techniques based on their consulting and training experience. The authors understand first-hand that a data warehousing/business intelligence (DW/BI) system needs to change as fast as its surrounding organization evolves. To that end, they walk you through the detailed steps of designing, developing, and deploying a DW/BI system. You'll learn to create adaptable systems that deliver data and analyses to business users so they can make better business decisions.

Database Management Systems


Raghu Ramakrishnan - 1997
    Coherent explanations and practical examples have made this one of the leading texts in the field. The third edition continues in this tradition, enhancing it with more practical material. The new edition has been reorganized to allow more flexibility in the way the course is taught. Now, instructors can easily choose whether they would like to teach a course which emphasizes database application development or a course that emphasizes database systems issues. New overview chapters at the beginning of parts make it possible to skip other chapters in the part if you don't want the detail.More applications and examples have been added throughout the book, including SQL and Oracle examples. The applied flavor is further enhanced by the two new database applications chapters.