Book picks similar to
Modern Information Retrieval by Ricardo Baeza-Yates


computer-science
information-retrieval
computer
data-science

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement


Eric Redmond - 2012
    As a modern application developer you need to understand the emerging field of data management, both RDBMS and NoSQL. Seven Databases in Seven Weeks takes you on a tour of some of the hottest open source databases today. In the tradition of Bruce A. Tate's Seven Languages in Seven Weeks, this book goes beyond your basic tutorial to explore the essential concepts at the core each technology. Redis, Neo4J, CouchDB, MongoDB, HBase, Riak and Postgres. With each database, you'll tackle a real-world data problem that highlights the concepts and features that make it shine. You'll explore the five data models employed by these databases-relational, key/value, columnar, document and graph-and which kinds of problems are best suited to each. You'll learn how MongoDB and CouchDB are strikingly different, and discover the Dynamo heritage at the heart of Riak. Make your applications faster with Redis and more connected with Neo4J. Use MapReduce to solve Big Data problems. Build clusters of servers using scalable services like Amazon's Elastic Compute Cloud (EC2). Discover the CAP theorem and its implications for your distributed data. Understand the tradeoffs between consistency and availability, and when you can use them to your advantage. Use multiple databases in concert to create a platform that's more than the sum of its parts, or find one that meets all your needs at once.Seven Databases in Seven Weeks will take you on a deep dive into each of the databases, their strengths and weaknesses, and how to choose the ones that fit your needs.What You Need: To get the most of of this book you'll have to follow along, and that means you'll need a *nix shell (Mac OSX or Linux preferred, Windows users will need Cygwin), and Java 6 (or greater) and Ruby 1.8.7 (or greater). Each chapter will list the downloads required for that database.

JavaScript: The Good Parts


Douglas Crockford - 2008
    This authoritative book scrapes away these bad features to reveal a subset of JavaScript that's more reliable, readable, and maintainable than the language as a whole--a subset you can use to create truly extensible and efficient code.Considered the JavaScript expert by many people in the development community, author Douglas Crockford identifies the abundance of good ideas that make JavaScript an outstanding object-oriented programming language-ideas such as functions, loose typing, dynamic objects, and an expressive object literal notation. Unfortunately, these good ideas are mixed in with bad and downright awful ideas, like a programming model based on global variables.When Java applets failed, JavaScript became the language of the Web by default, making its popularity almost completely independent of its qualities as a programming language. In JavaScript: The Good Parts, Crockford finally digs through the steaming pile of good intentions and blunders to give you a detailed look at all the genuinely elegant parts of JavaScript, including:SyntaxObjectsFunctionsInheritanceArraysRegular expressionsMethodsStyleBeautiful featuresThe real beauty? As you move ahead with the subset of JavaScript that this book presents, you'll also sidestep the need to unlearn all the bad parts. Of course, if you want to find out more about the bad parts and how to use them badly, simply consult any other JavaScript book.With JavaScript: The Good Parts, you'll discover a beautiful, elegant, lightweight and highly expressive language that lets you create effective code, whether you're managing object libraries or just trying to get Ajax to run fast. If you develop sites or applications for the Web, this book is an absolute must.

Linux Kernel Development


Robert Love - 2003
    The book details the major subsystems and features of the Linux kernel, including its design, implementation, and interfaces. It covers the Linux kernel with both a practical and theoretical eye, which should appeal to readers with a variety of interests and needs. The author, a core kernel developer, shares valuable knowledge and experience on the 2.6 Linux kernel. Specific topics covered include process management, scheduling, time management and timers, the system call interface, memory addressing, memory management, the page cache, the VFS, kernel synchronization, portability concerns, and debugging techniques. This book covers the most interesting features of the Linux 2.6 kernel, including the CFS scheduler, preemptive kernel, block I/O layer, and I/O schedulers. The third edition of Linux Kernel Development includes new and updated material throughout the book:An all-new chapter on kernel data structuresDetails on interrupt handlers and bottom halvesExtended coverage of virtual memory and memory allocationTips on debugging the Linux kernelIn-depth coverage of kernel synchronization and lockingUseful insight into submitting kernel patches and working with the Linux kernel community

Building Microservices: Designing Fine-Grained Systems


Sam Newman - 2014
    But developing these systems brings its own set of headaches. With lots of examples and practical advice, this book takes a holistic view of the topics that system architects and administrators must consider when building, managing, and evolving microservice architectures.Microservice technologies are moving quickly. Author Sam Newman provides you with a firm grounding in the concepts while diving into current solutions for modeling, integrating, testing, deploying, and monitoring your own autonomous services. You'll follow a fictional company throughout the book to learn how building a microservice architecture affects a single domain.Discover how microservices allow you to align your system design with your organization's goalsLearn options for integrating a service with the rest of your systemTake an incremental approach when splitting monolithic codebasesDeploy individual microservices through continuous integrationExamine the complexities of testing and monitoring distributed servicesManage security with user-to-service and service-to-service modelsUnderstand the challenges of scaling microservice architectures

Big Data: A Revolution That Will Transform How We Live, Work, and Think


Viktor Mayer-Schönberger - 2013
    “Big data” refers to our burgeoning ability to crunch vast collections of information, analyze it instantly, and draw sometimes profoundly surprising conclusions from it. This emerging science can translate myriad phenomena—from the price of airline tickets to the text of millions of books—into searchable form, and uses our increasing computing power to unearth epiphanies that we never could have seen before. A revolution on par with the Internet or perhaps even the printing press, big data will change the way we think about business, health, politics, education, and innovation in the years to come. It also poses fresh threats, from the inevitable end of privacy as we know it to the prospect of being penalized for things we haven’t even done yet, based on big data’s ability to predict our future behavior.In this brilliantly clear, often surprising work, two leading experts explain what big data is, how it will change our lives, and what we can do to protect ourselves from its hazards. Big Data is the first big book about the next big thing.www.big-data-book.com

The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations


Gene Kim - 2015
    For decades, technology leaders have struggled to balance agility, reliability, and security. The consequences of failure have never been greater whether it's the healthcare.gov debacle, cardholder data breaches, or missing the boat with Big Data in the cloud.And yet, high performers using DevOps principles, such as Google, Amazon, Facebook, Etsy, and Netflix, are routinely and reliably deploying code into production hundreds, or even thousands, of times per day.Following in the footsteps of The Phoenix Project, The DevOps Handbook shows leaders how to replicate these incredible outcomes, by showing how to integrate Product Management, Development, QA, IT Operations, and Information Security to elevate your company and win in the marketplace."Table of contentsPrefaceSpreading the Aha! MomentIntroductionPART I: THE THREE WAYS1. Agile, continuous delivery and the three ways2. The First Way: The Principles of Flow3. The Second Way: The Principle of Feedback4. The Third Way: The Principles of Continual LearningPART II: WHERE TO START5. Selecting which value stream to start with6. Understanding the work in our value stream…7. How to design our organization and architecture8. How to get great outcomes by integrating operations into the daily work for developmentPART III: THE FIRST WAY: THE TECHNICAL PRACTICES OF FLOW9. Create the foundations of our deployment pipeline10. Enable fast and reliable automated testing11. Enable and practice continuous integration12. Automate and enable low-risk releases13. Architect for low-risk releasesPART IV: THE SECOND WAY: THE TECHNICAL PRACTICES OF FEEDBACK14*. Create telemetry to enable seeing abd solving problems15. Analyze telemetry to better anticipate problems16. Enable feedbackso development and operation can safely deploy code17. Integrate hypothesis-driven development and A/B testing into our daily work18. Create review and coordination processes to increase quality of our current workPART V: THE THRID WAY: THE TECHNICAL PRACTICES OF CONTINUAL LEARNING19. Enable and inject learning into daily work20. Convert local discoveries into global improvements21. Reserve time to create organizational learning22. Information security as everyone’s job, every day23. Protecting the deployment pipelinePART VI: CONCLUSIONA call to actionConclusion to the DevOps HandbookAPPENDICES1. The convergence of Devops2. The theory of constraints and core chronic conflicts3. Tabular form of downward spiral4. The dangers of handoffs and queues5. Myths of industrial safety6. The Toyota Andon Cord7. COTS Software8. Post-mortem meetings9. The Simian Army10. Transparent uptimeAdditional ResourcesEndnotes

Perl Best Practices: Standards and Styles for Developing Maintainable Code


Damian Conway - 2005
    They aren't conscious of all the choices they make, like how they format their source, the names they use for variables, or the kinds of loops they use. They're focused entirely on problems they're solving, solutions they're creating, and algorithms they're implementing. So they write code in the way that seems natural, that happens intuitively, and that feels good.But if you're serious about your profession, intuition isn't enough. Perl Best Practices author Damian Conway explains that rules, conventions, standards, and practices not only help programmers communicate and coordinate with one another, they also provide a reliable framework for thinking about problems, and a common language for expressing solutions. This is especially critical in Perl, because the language is designed to offer many ways to accomplish the same task, and consequently it supports many incompatible dialects.With a good dose of Aussie humor, Dr. Conway (familiar to many in the Perl community) offers 256 guidelines on the art of coding to help you write better Perl code--in fact, the best Perl code you possibly can. The guidelines cover code layout, naming conventions, choice of data and control structures, program decomposition, interface design and implementation, modularity, object orientation, error handling, testing, and debugging.They're designed to work together to produce code that is clear, robust, efficient, maintainable, and concise, but Dr. Conway doesn't pretend that this is the one true universal and unequivocal set of best practices. Instead, Perl Best Practices offers coherent and widely applicable suggestions based on real-world experience of how code is actually written, rather than on someone's ivory-tower theories on how software ought to be created.Most of all, Perl Best Practices offers guidelines that actually work, and that many developers around the world are already using. Much like Perl itself, these guidelines are about helping you to get your job done, without getting in the way.Praise for Perl Best Practices from Perl community members:"As a manager of a large Perl project, I'd ensure that every member of my team has a copy of Perl Best Practices on their desk, and use it as the basis for an in-house style guide." -- Randal Schwartz"There are no more excuses for writing bad Perl programs. All levels of Perl programmer will be more productive after reading this book." -- Peter Scott"Perl Best Practices will be the next big important book in the evolution of Perl. The ideas and practices Damian lays down will help bring Perl out from under the embarrassing heading of "scripting languages". Many of us have known Perl is a real programming language, worthy of all the tasks normally delegated to Java and C++. With Perl Best Practices, Damian shows specifically how and why, so everyone else can see, too." -- Andy Lester"Damian's done what many thought impossible: show how to build large, maintainable Perl applications, while still letting Perl be the powerful, expressive language that programmers have loved for years." -- Bill Odom"Finally, a means to bring lasting order to the process and product of real Perl development teams." -- Andrew Sundstrom"Perl Best Practices provides a valuable education in how to write robust, maintainable P

Information Theory, Inference and Learning Algorithms


David J.C. MacKay - 2002
    These topics lie at the heart of many exciting areas of contemporary science and engineering - communication, signal processing, data mining, machine learning, pattern recognition, computational neuroscience, bioinformatics, and cryptography. This textbook introduces theory in tandem with applications. Information theory is taught alongside practical communication systems, such as arithmetic coding for data compression and sparse-graph codes for error-correction. A toolbox of inference techniques, including message-passing algorithms, Monte Carlo methods, and variational approximations, are developed alongside applications of these tools to clustering, convolutional codes, independent component analysis, and neural networks. The final part of the book describes the state of the art in error-correcting codes, including low-density parity-check codes, turbo codes, and digital fountain codes -- the twenty-first century standards for satellite communications, disk drives, and data broadcast. Richly illustrated, filled with worked examples and over 400 exercises, some with detailed solutions, David MacKay's groundbreaking book is ideal for self-learning and for undergraduate or graduate courses. Interludes on crosswords, evolution, and sex provide entertainment along the way. In sum, this is a textbook on information, communication, and coding for a new generation of students, and an unparalleled entry point into these subjects for professionals in areas as diverse as computational biology, financial engineering, and machine learning.

Envisioning Information


Edward R. Tufte - 1990
    The Whole Earth Review called Envisioning Information a "passionate, elegant revelation."

Modern Operating Systems


Andrew S. Tanenbaum - 1992
    What makes an operating system modern? According to author Andrew Tanenbaum, it is the awareness of high-demand computer applications--primarily in the areas of multimedia, parallel and distributed computing, and security. The development of faster and more advanced hardware has driven progress in software, including enhancements to the operating system. It is one thing to run an old operating system on current hardware, and another to effectively leverage current hardware to best serve modern software applications. If you don't believe it, install Windows 3.0 on a modern PC and try surfing the Internet or burning a CD. Readers familiar with Tanenbaum's previous text, Operating Systems, know the author is a great proponent of simple design and hands-on experimentation. His earlier book came bundled with the source code for an operating system called Minux, a simple variant of Unix and the platform used by Linus Torvalds to develop Linux. Although this book does not come with any source code, he illustrates many of his points with code fragments (C, usually with Unix system calls). The first half of Modern Operating Systems focuses on traditional operating systems concepts: processes, deadlocks, memory management, I/O, and file systems. There is nothing groundbreaking in these early chapters, but all topics are well covered, each including sections on current research and a set of student problems. It is enlightening to read Tanenbaum's explanations of the design decisions made by past operating systems gurus, including his view that additional research on the problem of deadlocks is impractical except for "keeping otherwise unemployed graph theorists off the streets." It is the second half of the book that differentiates itself from older operating systems texts. Here, each chapter describes an element of what constitutes a modern operating system--awareness of multimedia applications, multiple processors, computer networks, and a high level of security. The chapter on multimedia functionality focuses on such features as handling massive files and providing video-on-demand. Included in the discussion on multiprocessor platforms are clustered computers and distributed computing. Finally, the importance of security is discussed--a lively enumeration of the scores of ways operating systems can be vulnerable to attack, from password security to computer viruses and Internet worms. Included at the end of the book are case studies of two popular operating systems: Unix/Linux and Windows 2000. There is a bias toward the Unix/Linux approach, not surprising given the author's experience and academic bent, but this bias does not detract from Tanenbaum's analysis. Both operating systems are dissected, describing how each implements processes, file systems, memory management, and other operating system fundamentals. Tanenbaum's mantra is simple, accessible operating system design. Given that modern operating systems have extensive features, he is forced to reconcile physical size with simplicity. Toward this end, he makes frequent references to the Frederick Brooks classic The Mythical Man-Month for wisdom on managing large, complex software development projects. He finds both Windows 2000 and Unix/Linux guilty of being too complicated--with a particular skewering of Windows 2000 and its "mammoth Win32 API." A primary culprit is the attempt to make operating systems more "user-friendly," which Tanenbaum views as an excuse for bloated code. The solution is to have smart people, the smallest possible team, and well-defined interactions between various operating systems components. Future operating system design will benefit if the advice in this book is taken to heart. --Pete Ostenson

The Elements of Statistical Learning: Data Mining, Inference, and Prediction


Trevor Hastie - 2001
    With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It should be a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting—the first comprehensive treatment of this topic in any book. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie wrote much of the statistical modeling software in S-PLUS and invented principal curves and surfaces. Tibshirani proposed the Lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, and projection pursuit.

Semantic Web for the Working Ontologist: Effective Modeling in RDFS and OWL


Dean Allemang - 2008
    There are many sources too for basic information on the extensions to the WWW that permit content to be expressed in natural language yet used by software agents to easily find, share and integrate information. Until now individuals engaged in creating ontologies-- formal descriptions of the concepts, terms, and relationships within a given knowledge domain-- have had no sources beyond the technical standards documents.Semantic Web for the Working Ontologist transforms this information into the practical knowledge that programmers and subject domain experts need. Authors Allemang and Hendler begin with solutions to the basic problems, but don't stop there: they demonstrate how to develop your own solutions to problems of increasing complexity and ensure that your skills will keep pace with the continued evolution of the Semantic Web.

Implementing Domain-Driven Design


Vaughn Vernon - 2013
    Vaughn Vernon couples guided approaches to implementation with modern architectures, highlighting the importance and value of focusing on the business domain while balancing technical considerations.Building on Eric Evans’ seminal book, Domain-Driven Design, the author presents practical DDD techniques through examples from familiar domains. Each principle is backed up by realistic Java examples–all applicable to C# developers–and all content is tied together by a single case study: the delivery of a large-scale Scrum-based SaaS system for a multitenant environment.The author takes you far beyond “DDD-lite” approaches that embrace DDD solely as a technical toolset, and shows you how to fully leverage DDD’s “strategic design patterns” using Bounded Context, Context Maps, and the Ubiquitous Language. Using these techniques and examples, you can reduce time to market and improve quality, as you build software that is more flexible, more scalable, and more tightly aligned to business goals.

Automate the Boring Stuff with Python: Practical Programming for Total Beginners


Al Sweigart - 2014
    But what if you could have your computer do them for you?In "Automate the Boring Stuff with Python," you'll learn how to use Python to write programs that do in minutes what would take you hours to do by hand no prior programming experience required. Once you've mastered the basics of programming, you'll create Python programs that effortlessly perform useful and impressive feats of automation to: Search for text in a file or across multiple filesCreate, update, move, and rename files and foldersSearch the Web and download online contentUpdate and format data in Excel spreadsheets of any sizeSplit, merge, watermark, and encrypt PDFsSend reminder emails and text notificationsFill out online formsStep-by-step instructions walk you through each program, and practice projects at the end of each chapter challenge you to improve those programs and use your newfound skills to automate similar tasks.Don't spend your time doing work a well-trained monkey could do. Even if you've never written a line of code, you can make your computer do the grunt work. Learn how in "Automate the Boring Stuff with Python.""

Deep Learning


Ian Goodfellow - 2016
    Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning.The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models.Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.