My goal is to accompany a reader who is starting to study this programming language, showing her through basic concepts and then move to data mining. We will begin by explaining how to use Python and its structures, how to install Python, which tools are best suited for a data analyst work, and then switch to an introduction to data mining packages. The book is in any case an introduction. Its aim is not, for instance, to fully explain topics such as machine learning or statistics with this programming language, which would take at least twice or three times as much as this entire book. The aim is to provide a guidance from the first programming steps with Python to manipulation and import of datasets, to some examples of data analysis.
To be more precise, in the Getting Started section, we will run through some basic installation concepts, tools available for programming on Python, differences between Python2 and Python3, and setting up a work folder.
In Chapter 1, we will begin to see some basic concepts about creating objects, entering comments, reserved words for the system, and on the various types of operators that are part of the grammar of this programming language.
In Chapter 2, we will carry on with the basic Python structures, such as tuples, lists, dictionaries, sets, strings, and files, and learn how to create and convert them.
In Chapter 3 we will see the basics for creating small basic functions, and how to save them.
Chapter 4 deals with conditional instructions that allow us to extend the power of a function as well as some important functions.
In Chapter 5 we will keep talking about some basic concepts related to object-oriented programming, concept of module, method, and error handling.
Chapter 6 is dedicated to importing files with some of the basic features. We will see how to open and edit text files, in .csv format, and in various other formats.
Chapters 7 to 10 will deal with Python's most important data mining packages: Numpy and Scipy for mathematical functions and random data generation, pandas for dataframe management and data import, Matplotlib for drawing charts and scikit-learn for machine learning. With regard to scikit-learn, we will limit ourselves to provide a basic idea of the code of the various algorithms, without going, given the complexity of the subject, into details for the various techniques.
Finally, in Conclusions, we will summarize the topics and concepts of the book and see the management of dates and some of the data sources for our tests with Python.
This book is intended for those who want to get closer to the Python programming language from a data analysis perspective. We will therefore focus on the most used packages for data analysis, after the introduction to Python's basic concepts.