Book picks similar to
The Data Engineering Cookbook by Andreas Kretz
computer-science
data-engineering
tech
0-april
Data Science at the Command Line: Facing the Future with Time-Tested Tools
Jeroen Janssens - 2014
You'll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.To get you started--whether you're on Windows, OS X, or Linux--author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools.Discover why the command line is an agile, scalable, and extensible technology. Even if you're already comfortable processing data with, say, Python or R, you'll greatly improve your data science workflow by also leveraging the power of the command line.Obtain data from websites, APIs, databases, and spreadsheetsPerform scrub operations on plain text, CSV, HTML/XML, and JSONExplore data, compute descriptive statistics, and create visualizationsManage your data science workflow using DrakeCreate reusable tools from one-liners and existing Python or R codeParallelize and distribute data-intensive pipelines using GNU ParallelModel data with dimensionality reduction, clustering, regression, and classification algorithms
React: Up and Running
Stoyan Stefanov - 2015
With "React: Up and Running" you'll learn how to get off the ground with React, with no prior knowledge.This book teaches you how to build components, the building blocks of your apps, as well as how to organize the components into large-scale apps. In addition, you ll learn about unit testing and optimizing performance, while focusing on the application s data (and letting the UI take care of itself)."
Data Smart: Using Data Science to Transform Information into Insight
John W. Foreman - 2013
Major retailers are predicting everything from when their customers are pregnant to when they want a new pair of Chuck Taylors. It's a brave new world where seemingly meaningless data can be transformed into valuable insight to drive smart business decisions.But how does one exactly do data science? Do you have to hire one of these priests of the dark arts, the "data scientist," to extract this gold from your data? Nope.Data science is little more than using straight-forward steps to process raw data into actionable insight. And in Data Smart, author and data scientist John Foreman will show you how that's done within the familiar environment of a spreadsheet. Why a spreadsheet? It's comfortable! You get to look at the data every step of the way, building confidence as you learn the tricks of the trade. Plus, spreadsheets are a vendor-neutral place to learn data science without the hype. But don't let the Excel sheets fool you. This is a book for those serious about learning the analytic techniques, the math and the magic, behind big data.Each chapter will cover a different technique in a spreadsheet so you can follow along: - Mathematical optimization, including non-linear programming and genetic algorithms- Clustering via k-means, spherical k-means, and graph modularity- Data mining in graphs, such as outlier detection- Supervised AI through logistic regression, ensemble models, and bag-of-words models- Forecasting, seasonal adjustments, and prediction intervals through monte carlo simulation- Moving from spreadsheets into the R programming languageYou get your hands dirty as you work alongside John through each technique. But never fear, the topics are readily applicable and the author laces humor throughout. You'll even learn what a dead squirrel has to do with optimization modeling, which you no doubt are dying to know.
Machine Learning for Dummies
John Paul Mueller - 2016
Without machine learning, fraud detection, web search results, real-time ads on web pages, credit scoring, automation, and email spam filtering wouldn't be possible, and this is only showcasing just a few of its capabilities. Written by two data science experts, Machine Learning For Dummies offers a much-needed entry point for anyone looking to use machine learning to accomplish practical tasks.Covering the entry-level topics needed to get you familiar with the basic concepts of machine learning, this guide quickly helps you make sense of the programming languages and tools you need to turn machine learning-based tasks into a reality. Whether you're maddened by the math behind machine learning, apprehensive about AI, perplexed by preprocessing data--or anything in between--this guide makes it easier to understand and implement machine learning seamlessly.Grasp how day-to-day activities are powered by machine learning Learn to 'speak' certain languages, such as Python and R, to teach machines to perform pattern-oriented tasks and data analysis Learn to code in R using R Studio Find out how to code in Python using Anaconda Dive into this complete beginner's guide so you are armed with all you need to know about machine learning!
Mining of Massive Datasets
Anand Rajaraman - 2011
This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.
Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software
Michael Sikorski - 2011
When malware breaches your defenses, you need to act quickly to cure current infections and prevent future ones from occurring.For those who want to stay ahead of the latest malware, Practical Malware Analysis will teach you the tools and techniques used by professional analysts. With this book as your guide, you'll be able to safely analyze, debug, and disassemble any malicious software that comes your way.You'll learn how to:Set up a safe virtual environment to analyze malware Quickly extract network signatures and host-based indicators Use key analysis tools like IDA Pro, OllyDbg, and WinDbg Overcome malware tricks like obfuscation, anti-disassembly, anti-debugging, and anti-virtual machine techniques Use your newfound knowledge of Windows internals for malware analysis Develop a methodology for unpacking malware and get practical experience with five of the most popular packers Analyze special cases of malware with shellcode, C++, and 64-bit code Hands-on labs throughout the book challenge you to practice and synthesize your skills as you dissect real malware samples, and pages of detailed dissections offer an over-the-shoulder look at how the pros do it. You'll learn how to crack open malware to see how it really works, determine what damage it has done, thoroughly clean your network, and ensure that the malware never comes back.Malware analysis is a cat-and-mouse game with rules that are constantly changing, so make sure you have the fundamentals. Whether you're tasked with securing one network or a thousand networks, or you're making a living as a malware analyst, you'll find what you need to succeed in Practical Malware Analysis.
Head First HTML5 Programming
Eric Freeman - 2011
Sure, HTML started as a mere markup language, but more recently HTML’s put on some major muscle. Now we’ve got a language tuned for building web applications with Web storage, 2D drawing, offline support, sockets and threads, and more. And to speak this language you’ve got to go beyond HTML5 markup and into the world of the DOM, events, and JavaScript APIs. Now you probably already know all about HTML markup (otherwise known as structure) and you know all aboutCSS style (presentation), but what you’ve been missing is JavaScript (behavior). If all you know about are structure and presentation, you can create some great looking pages, but they’re still just pages. When you add behavior with JavaScript, you can create an interactive experience; even better, you can create full blown web applications.Head First HTML5 Programming is your ultimate tour guide to creating web applications with HTML5 and JavaScript, and we give you everything you need to know to build them, including: how to add interactivity to your pages, how to communicate with the world of Web services, and how to use the great new APIs being developed for HTML5. Here are just some of the things you’ll learn in Head First HTML5 Programing:Learn how to make your pages truly interactive by using the power of the DOM.Finally understand how JavaScript works and take yourself from novice to well-informed in just a few chapters.Learn how JavaScript APIs fit into the HTML5 ecosystem, and how to use any API in your web pages.Use the Geolocation API to know where your users are.Bring out your inner artist with Canvas, HTML5’s new 2D drawing surface.Go beyond just plugging a video into your pages, and create custom video experiences.Learn the secret to grabbing five megabytes of storage in every user’s browser.Improve your page’s responsiveness and performance with Web workers.And much more.
Big data @ work : dispelling the myths, uncovering the opportunities
Thomas H. Davenport - 2014
The author was—at first.When the term “big data” first came on the scene, bestselling author Tom Davenport (Competing on Analytics, Analytics at Work) thought it was just another example of technology hype. But his research in the years that followed changed his mind.Now, in clear, conversational language, Davenport explains what big data means—and why everyone in business needs to know about it. Big Data at Work covers all the bases: what big data means from a technical, consumer, and management perspective; what its opportunities and costs are; where it can have real business impact; and which aspects of this hot topic have been oversold.This book will help you understand:• Why big data is important to you and your organization• What technology you need to manage it• How big data could change your job, your company, and your industry• How to hire, rent, or develop the kinds of people who make big data work• The key success factors in implementing any big data project• How big data is leading to a new approach to managing analyticsWith dozens of company examples, including UPS, GE, Amazon, United Healthcare, Citigroup, and many others, this book will help you seize all opportunities—from improving decisions, products, and services to strengthening customer relationships. It will show you how to put big data to work in your own organization so that you too can harness the power of this ever-evolving new resource.
The IDA Pro Book: The Unofficial Guide to the World's Most Popular Disassembler
Chris Eagle - 2008
With IDA Pro, you live in a source code-optional world. IDA can automatically analyze the millions of opcodes that make up an executable and present you with a disassembly. But at that point, your work is just beginning. With The IDA Pro Book, you'll learn how to turn that mountain of mnemonics into something you can actually use.Hailed by the creator of IDA Pro as the "long-awaited" and "information-packed" guide to IDA, The IDA Pro Book covers everything from the very first steps to advanced automation techniques. While other disassemblers slow your analysis with inflexibility, IDA invites you to customize its output for improved readability and usefulness. You'll save time and effort as you learn to:Identify known library routines, so you can focus your analysis on other areas of the code Extend IDA to support new processors and filetypes, making disassembly possible for new or obscure architectures Explore popular plug-ins that make writing IDA scripts easier, allow collaborative reverse engineering, and much more Utilize IDA's built-in debugger to tackle obfuscated code that would defeat a stand-alone disassembler You'll still need serious assembly skills to tackle the toughest executables, but IDA makes things a lot easier. Whether you're analyzing the software on a black box or conducting hard-core vulnerability research, a mastery of IDA Pro is crucial to your success. Take your skills to the next level with The IDA Pro Book.
Machine Learning: A Probabilistic Perspective
Kevin P. Murphy - 2012
Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach.The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package—PMTK (probabilistic modeling toolkit)—that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
R in a Nutshell: A Desktop Quick Reference
Joseph Adler - 2009
R in a Nutshell provides a quick and practical way to learn this increasingly popular open source language and environment. You'll not only learn how to program in R, but also how to find the right user-contributed R packages for statistical modeling, visualization, and bioinformatics.The author introduces you to the R environment, including the R graphical user interface and console, and takes you through the fundamentals of the object-oriented R language. Then, through a variety of practical examples from medicine, business, and sports, you'll learn how you can use this remarkable tool to solve your own data analysis problems.Understand the basics of the language, including the nature of R objectsLearn how to write R functions and build your own packagesWork with data through visualization, statistical analysis, and other methodsExplore the wealth of packages contributed by the R communityBecome familiar with the lattice graphics package for high-level data visualizationLearn about bioinformatics packages provided by Bioconductor"I am excited about this book. R in a Nutshell is a great introduction to R, as well as a comprehensive reference for using R in data analytics and visualization. Adler provides 'real world' examples, practical advice, and scripts, making it accessible to anyone working with data, not just professional statisticians."
Taming Text: How to Find, Organize, and Manipulate It
Grant S. Ingersoll - 2011
This causes real problems for everyday users who need to make sense of all the information available, and for software engineers who want to make their text-based applications more useful and user-friendly. Whether building a search engine for a corporate website, automatically organizing email, or extracting important nuggets of information from the news, dealing with unstructured text can be daunting.Taming Text is a hands-on, example-driven guide to working with unstructured text in the context of real-world applications. It explores how to automatically organize text, using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. This book gives examples illustrating each of these topics, as well as the foundations upon which they are built.Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.
Laravel: Code Bright
Dayle Rees - 2013
At $29 and cheaper than a good pizza, you will get the book in its current partial form, along with all future chapters, updates, and fixes for free. As of the day I wrote this description, Code Bright had 130 pages and was just getting started. To give you some perspective on how detailed it is, Code Happy was 127 pages in its complete state. Want to know more? Carry on reading.Welcome back to Laravel. Last year I wrote a book about the Laravel PHP framework. It started as a collection of tutorials on my blog, and eventually became a full book. I definitely didn’t expect it to be as popular as it was. Code Happy has sold almost 3000 copies, and is considered to be one of the most valuable resourcesfor learning the Laravel framework.Code Bright is the spiritual successor to Code Happy. The framework has grown a lot in the past year, and has changed enough to merit a new title. With Code Bright I hope to improve on Code Happy with every way, my goal is, to once again, build the most comprehensive learning experience for the framework. Oh, and to still be funny. That’s very important to me.Laravel Code Bright will contain a complete learning experience for all of the framework’s features. The style of writing will make it approachable for beginners, and a wonderful reference resource for experienced developers alike.You see, people have told me that they enjoyed reading Code Happy, not only for its educational content, but for its humour, and for my down to earth writing style. This is very important to me. I like to write my books as if we were having a conversation in a bar.When I wrote Code Happy last year, I was simply a framework enthusiast. One of the first to share information about the framework. However, since then I have become a committed member of the core development team. Working directly with the framework author to make Laravel a wonderful experience for the developers of the world.One other important feature of both books, is that they are published while in progress. This means that the book is available in an incomplete state, but will grow over time into a complete title. All future updates will be provided for free.What this means is that I don’t have to worry about deadlines, or a fixed point of completion. It leads to less stress and better writing. If I think of a better way to explain something, I can go back and change it. In a sense, the book will never be completed. I can constantly add more information to it, until it becomes the perfect resource.Given that this time I am using the majority of my spare time to write the title (yes, I have a full time job too!), I have raised the price a little to justify my invested time. I was told by many of my past readers that they found the previous title very cheap for the resource that it grew into, so if you are worried about the new price, then let me remind you what you will get for your 29 bucks.The successor to Code Happy, seen by many as the #1 learning resource for the Laravel PHP framework.An unending source of information, chapters will be constantly added as needed until the book becomes a giant vault of framework knowledge.Comedy, and a little cheesy, but very friendly writing.
Data Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems)
Jiawei Han - 2000
Not only are all of our business, scientific, and government transactions now computerized, but the widespread use of digital cameras, publication tools, and bar codes also generate data. On the collection side, scanned text and image platforms, satellite remote sensing systems, and the World Wide Web have flooded us with a tremendous amount of data. This explosive growth has generated an even more urgent need for new techniques and automated tools that can help us transform this data into useful information and knowledge.Like the first edition, voted the most popular data mining book by KD Nuggets readers, this book explores concepts and techniques for the discovery of patterns hidden in large data sets, focusing on issues relating to their feasibility, usefulness, effectiveness, and scalability. However, since the publication of the first edition, great progress has been made in the development of new data mining methods, systems, and applications. This new edition substantially enhances the first edition, and new chapters have been added to address recent developments on mining complex types of data- including stream data, sequence data, graph structured data, social network data, and multi-relational data.A comprehensive, practical look at the concepts and techniques you need to know to get the most out of real business dataUpdates that incorporate input from readers, changes in the field, and more material on statistics and machine learningDozens of algorithms and implementation examples, all in easily understood pseudo-code and suitable for use in real-world, large-scale data mining projectsComplete classroom support for instructors at www.mkp.com/datamining2e companion site
Planning for Big Data
Edd Wilder-James - 2004
From creating new data-driven products through to increasing operational efficiency, big data has the potential to makeyour organization both more competitive and more innovative.As this emerging field transitions from the bleeding edge to enterprise infrastructure, it's vital to understand not only the technologies involved, but the organizational and cultural demands of being data-driven.Written by O'Reilly Radar's experts on big data, this anthology describes:- The broad industry changes heralded by the big data era- What big data is, what it means to your business, and how to start solving data problems- The software that makes up the Hadoop big data stack, and the major enterprise vendors' Hadoop solutions- The landscape of NoSQL databases and their relative merits- How visualization plays an important part in data work