High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark


Holden Karau - 2017
    But if you haven't seen the performance improvements you expected, or still don't feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run faster and handle larger data sizes, while using fewer resources.Ideal for software engineers, data engineers, developers, and system administrators working with large-scale data applications, this book describes techniques that can reduce data infrastructure costs and developer hours. Not only will you gain a more comprehensive understanding of Spark, you'll also learn how to make it sing.With this book, you'll explore:How Spark SQL's new interfaces improve performance over SQL's RDD data structureThe choice between data joins in Core Spark and Spark SQLTechniques for getting the most out of standard RDD transformationsHow to work around performance issues in Spark's key/value pair paradigmWriting high-performance Spark code without Scala or the JVMHow to test for functionality and performance when applying suggested improvementsUsing Spark MLlib and Spark ML machine learning librariesSpark's Streaming components and external community packages

The Unofficial Guide to Ethical Hacking


Ankit Fadia - 2001
    For anyone interested in finding out how your fail-safe system was cracked and how you can better protect yourself, this book is a must-read. It contains helpful resources that you can reference to better protect your system from becoming the victim of attacks. It also includes discussion on the nature of file encryption, firewalls, and viruses and shows how users can make their systems more secure.

Machine Learning With Random Forests And Decision Trees: A Mostly Intuitive Guide, But Also Some Python


Scott Hartshorn - 2016
    They are typically used to categorize something based on other data that you have. The purpose of this book is to help you understand how Random Forests work, as well as the different options that you have when using them to analyze a problem. Additionally, since Decision Trees are a fundamental part of Random Forests, this book explains how they work. This book is focused on understanding Random Forests at the conceptual level. Knowing how they work, why they work the way that they do, and what options are available to improve results. This book covers how Random Forests work in an intuitive way, and also explains the equations behind many of the functions, but it only has a small amount of actual code (in python). This book is focused on giving examples and providing analogies for the most fundamental aspects of how random forests and decision trees work. The reason is that those are easy to understand and they stick with you. There are also some really interesting aspects of random forests, such as information gain, feature importances, or out of bag error, that simply cannot be well covered without diving into the equations of how they work. For those the focus is providing the information in a straight forward and easy to understand way.

Samurai Champloo, Volume 1


Masaru Gotsubo - 2005
    A hardworking waitress, an arrogant mercenary, and a mysterious samurai form an uneasy alliance. They are searching for the enigmatic Sunflower Samurai, but along the way they come across a collection of deceptive and insidious characters: ninjas, assassins, and even a prince in disguise. The journey proves to be nothing less than a roller coaster ride of battles, danger, desperation and companionship! Finally, here is a fresh spin on the samurai genre, based on the hit anime that Anime Insider calls, "The spiritual sequel to Cowboy Bebop!"

The Rootkit Arsenal: Escape and Evasion in the Dark Corners of the System


Bill Blunden - 2009
    Adopting an approach that favors full disclosure, The Rootkit Arsenal presents the most accessible, timely, and complete coverage of rootkit technology. This book covers more topics, in greater depth, than any other currently available. In doing so the author forges through the murky back alleys of the Internet, shedding light on material that has traditionally been poorly documented, partially documented, or intentionally undocumented.The spectrum of topics covered includes how to:* Hook kernel structures on multi-processor systems* Use a kernel debugger to reverse system internals* Inject call gates to create a back door into Ring-0* Use detour patches to sidestep group policy* Modify privilege levels on Vista by altering kernel objects* Utilize bootkit technology* Defeat live incident response and post-mortem forensics* Implement code armoring to protect your deliverables* Establish covert channels using the WSK and NDIS 6.0

This Machine Kills Secrets: How WikiLeakers, Cypherpunks, and Hacktivists Aim to Free the World's Information


Andy Greenberg - 2012
    WikiLeaks brought to light a new form of whistleblowing, using powerful cryptographic code to hide leakers’ identities while they spill the private data of government agencies and corporations. But that technology has been evolving for decades in the hands of hackers and radical activists, from the libertarian enclaves of Northern California to Berlin to the Balkans. And the secret-killing machine continues to evolve beyond WikiLeaks, as a movement of hacktivists aims to obliterate the world’s institutional secrecy.This is the story of the code and the characters—idealists, anarchists, extremists—who are transforming the next generation’s notion of what activism can be.With unrivaled access to such major players as Julian Assange, Daniel Domscheit-Berg, and WikiLeaks’ shadowy engineer known as the Architect, never before interviewed, reporter Andy Greenberg unveils the world of politically-motivated hackers—who they are and how they operate.

Cult of the Dead Cow: How the Original Hacking Supergroup Might Just Save the World


Joseph Menn - 2019
    Though until now it has remained mostly anonymous, its members invented the concept of hacktivism, released the top tool for testing password security, and created what was for years the best technique for controlling computers from afar, forcing giant companies to work harder to protect customers. They contributed to the development of Tor, the most important privacy tool on the net, and helped build cyberweapons that advanced US security without injuring anyone. With its origins in the earliest days of the Internet, the cDc is full of oddball characters -- activists, artists, even future politicians. Many of these hackers have become top executives and advisors walking the corridors of power in Washington and Silicon Valley. The most famous is former Texas Congressman and current presidential candidate Beto O'Rourke, whose time in the cDc set him up to found a tech business, launch an alternative publication in El Paso, and make long-shot bets on unconventional campaigns.Today, the group and its followers are battling electoral misinformation, making personal data safer, and battling to keep technology a force for good instead of for surveillance and oppression. Cult of the Dead Cow shows how governments, corporations, and criminals came to hold immense power over individuals and how we can fight back against them.

I Heart Logs: Event Data, Stream Processing, and Data Integration


Jay Kreps - 2014
    Even though most engineers don't think much about them, this short book shows you why logs are worthy of your attention.Based on his popular blog posts, LinkedIn principal engineer Jay Kreps shows you how logs work in distributed systems, and then delivers practical applications of these concepts in a variety of common uses--data integration, enterprise architecture, real-time stream processing, data system design, and abstract computing models.Go ahead and take the plunge with logs; you're going love them.Learn how logs are used for programmatic access in databases and distributed systemsDiscover solutions to the huge data integration problem when more data of more varieties meet more systemsUnderstand why logs are at the heart of real-time stream processingLearn the role of a log in the internals of online data systemsExplore how Jay Kreps applies these ideas to his own work on data infrastructure systems at LinkedIn

Hacker's Delight


Henry S. Warren Jr. - 2002
    Aiming to tell the dark secrets of computer arithmetic, this title is suitable for library developers, compiler writers, and lovers of elegant hacks.

Social Engineering: The Art of Human Hacking


Christopher Hadnagy - 2010
    Mitnick claims that this socialengineering tactic was the single-most effective method in hisarsenal. This indispensable book examines a variety of maneuversthat are aimed at deceiving unsuspecting victims, while it alsoaddresses ways to prevent social engineering threats.Examines social engineering, the science of influencing atarget to perform a desired task or divulge informationArms you with invaluable information about the many methods oftrickery that hackers use in order to gather information with theintent of executing identity theft, fraud, or gaining computersystem accessReveals vital steps for preventing social engineeringthreatsSocial Engineering: The Art of Human Hacking does itspart to prepare you against nefarious hackers--now you can doyour part by putting to good use the critical information withinits pages.

The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities


Mark Dowd - 2006
    Drawing on their extraordinary experience, they introduce a start-to-finish methodology for "ripping apart" applications to reveal even the most subtle and well-hidden security flaws.

Learn Windows PowerShell 3 in a Month of Lunches


Don Jones - 2011
    Just set aside one hour a day—lunchtime would be perfect—for a month, and you'll be automating Windows tasks faster than you ever thought possible. You'll start with the basics—what is PowerShell and what can you do with it. Then, you'll move systematically through the techniques and features you'll use to make your job easier and your day shorter. This totally revised second edition covers new PowerShell 3 features designed for Windows 8 and Windows Server 2012.Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.What's InsideLearn PowerShell from the beginning—no experience required! Covers PowerShell 3, Windows 8, and Windows Server 2012 Each lesson should take you one hour or lessAbout the TechnologyPowerShell is both a language and an administrative shell with which you can control and automate nearly every aspect of Windows. It accepts and executes commands immediately, and you can write scripts to manage most Windows servers like Exchange, IIS, and SharePoint.Experience with Windows administration is helpful. No programming experience is assumed.Table of ContentsBefore you begin Meet PowerShell Using the help system Running commands Working with providers The pipeline: connecting commands Adding commands Objects: data by another name The pipeline, deeper Formatting—and why it's done on the right Filtering and comparisons A practical interlude Remote control: one to one, and one to many Using Windows Management Instrumentation Multitasking with background jobs Working with many objects, one at a time Security alert! Variables: a place to store your stuff Input and output Sessions: remote control with less work You call this scripting? Improving your parameterized script Advanced remoting configuration Using regular expressions to parse text files Additional random tips, tricks, and techniques Using someone else's script Never the end PowerShell cheat sheet

Introduction to Machine Learning


Ethem Alpaydin - 2004
    Many successful applications of machine learning exist already, including systems that analyze past sales data to predict customer behavior, recognize faces or spoken speech, optimize robot behavior so that a task can be completed using minimum resources, and extract knowledge from bioinformatics data. "Introduction to Machine Learning" is a comprehensive textbook on the subject, covering a broad array of topics not usually included in introductory machine learning texts. It discusses many methods based in different fields, including statistics, pattern recognition, neural networks, artificial intelligence, signal processing, control, and data mining, in order to present a unified treatment of machine learning problems and solutions. All learning algorithms are explained so that the student can easily move from the equations in the book to a computer program. The book can be used by advanced undergraduates and graduate students who have completed courses in computer programming, probability, calculus, and linear algebra. It will also be of interest to engineers in the field who are concerned with the application of machine learning methods.After an introduction that defines machine learning and gives examples of machine learning applications, the book covers supervised learning, Bayesian decision theory, parametric methods, multivariate methods, dimensionality reduction, clustering, nonparametric methods, decision trees, linear discrimination, multilayer perceptrons, local models, hidden Markov models, assessing and comparing classification algorithms, combining multiple learners, and reinforcement learning.

Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference


Cameron Davidson-Pilon - 2014
    However, most discussions of Bayesian inference rely on intensely complex mathematical analyses and artificial examples, making it inaccessible to anyone without a strong mathematical background. Now, though, Cameron Davidson-Pilon introduces Bayesian inference from a computational perspective, bridging theory to practice-freeing you to get results using computing power. Bayesian Methods for Hackers illuminates Bayesian inference through probabilistic programming with the powerful PyMC language and the closely related Python tools NumPy, SciPy, and Matplotlib. Using this approach, you can reach effective solutions in small increments, without extensive mathematical intervention. Davidson-Pilon begins by introducing the concepts underlying Bayesian inference, comparing it with other techniques and guiding you through building and training your first Bayesian model. Next, he introduces PyMC through a series of detailed examples and intuitive explanations that have been refined after extensive user feedback. You'll learn how to use the Markov Chain Monte Carlo algorithm, choose appropriate sample sizes and priors, work with loss functions, and apply Bayesian inference in domains ranging from finance to marketing. Once you've mastered these techniques, you'll constantly turn to this guide for the working PyMC code you need to jumpstart future projects. Coverage includes - Learning the Bayesian "state of mind" and its practical implications - Understanding how computers perform Bayesian inference - Using the PyMC Python library to program Bayesian analyses - Building and debugging models with PyMC - Testing your model's "goodness of fit" - Opening the "black box" of the Markov Chain Monte Carlo algorithm to see how and why it works - Leveraging the power of the "Law of Large Numbers" - Mastering key concepts, such as clustering, convergence, autocorrelation, and thinning - Using loss functions to measure an estimate's weaknesses based on your goals and desired outcomes - Selecting appropriate priors and understanding how their influence changes with dataset size - Overcoming the "exploration versus exploitation" dilemma: deciding when "pretty good" is good enough - Using Bayesian inference to improve A/B testing - Solving data science problems when only small amounts of data are available Cameron Davidson-Pilon has worked in many areas of applied mathematics, from the evolutionary dynamics of genes and diseases to stochastic modeling of financial prices. His contributions to the open source community include lifelines, an implementation of survival analysis in Python. Educated at the University of Waterloo and at the Independent University of Moscow, he currently works with the online commerce leader Shopify.

Penetration Testing: A Hands-On Introduction to Hacking


Georgia Weidman - 2014
    This beginner-friendly book opens with some basics of programming and helps you navigate Kali Linux, an operating system that comes preloaded with useful computer security tools like Wireshark and Metasploit. You'll learn about gathering information on a target, social engineering, capturing network traffic, analyzing vulnerabilities, developing exploits, and more. Hands-on examples discuss even advanced topics like mobile device security and bypassing anti-virus software.