000 04229nam a22005055i 4500
001 978-3-030-57592-2
003 DE-He213
005 20240423125126.0
007 cr nn 008mamaa
008 201109s2020 sz | s |||| 0|eng d
020 _a9783030575922
_9978-3-030-57592-2
024 7 _a10.1007/978-3-030-57592-2
_2doi
050 4 _aQA76.9.D3
072 7 _aUN
_2bicssc
072 7 _aCOM021000
_2bisacsh
072 7 _aUN
_2thema
082 0 4 _a005.74
_223
100 1 _aBadia, Antonio.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
245 1 0 _aSQL for Data Science
_h[electronic resource] :
_bData Cleaning, Wrangling and Analytics with Relational Databases /
_cby Antonio Badia.
250 _a1st ed. 2020.
264 1 _aCham :
_bSpringer International Publishing :
_bImprint: Springer,
_c2020.
300 _aXI, 285 p. 16 illus.
_bonline resource.
336 _atext
_btxt
_2rdacontent
337 _acomputer
_bc
_2rdamedia
338 _aonline resource
_bcr
_2rdacarrier
347 _atext file
_bPDF
_2rda
490 1 _aData-Centric Systems and Applications,
_x2197-974X
505 0 _a1. The Data Life Cycle -- 2. Relational Data -- 3. Data Cleaning and Pre-processing -- 4. Introduction to Data Analysis -- 5. More SQL -- 6. Databases and Other Tools.
520 _aThis textbook explains SQL within the context of data science and introduces the different parts of SQL as they are needed for the tasks usually carried out during data analysis. Using the framework of the data life cycle, it focuses on the steps that are very often given the short shift in traditional textbooks, like data loading, cleaning and pre-processing. The book is organized as follows. Chapter 1 describes the data life cycle, i.e. the sequence of stages from data acquisition to archiving, that data goes through as it is prepared and then actually analyzed, together with the different activities that take place at each stage. Chapter 2 gets into databases proper, explaining how relational databases organize data. Non-traditional data, like XML and text, are also covered. Chapter 3 introduces SQL queries, but unlike traditional textbooks, queries and their parts are described around typical data analysis tasks like data exploration, cleaning and transformation.Chapter 4 introduces some basic techniques for data analysis and shows how SQL can be used for some simple analyses without too much complication. Chapter 5 introduces additional SQL constructs that are important in a variety of situations and thus completes the coverage of SQL queries. Lastly, chapter 6 briefly explains how to use SQL from within R and from within Python programs. It focuses on how these languages can interact with a database, and how what has been learned about SQL can be leveraged to make life easier when using R or Python. All chapters contain a lot of examples and exercises on the way, and readers are encouraged to install the two open-source database systems (MySQL and Postgres) that are used throughout the book in order to practice and work on the exercises, because simply reading the book is much less useful than actually using it. This book is for anyone interested in data science and/or databases. It just demands a bit of computer fluency, butno specific background on databases or data analysis. All concepts are introduced intuitively and with a minimum of specialized jargon. After going through this book, readers should be able to profitably learn more about data mining, machine learning, and database management from more advanced textbooks and courses.
650 0 _aDatabase management.
650 0 _aQuantitative research.
650 1 4 _aDatabase Management.
650 2 4 _aData Analysis and Big Data.
710 2 _aSpringerLink (Online service)
773 0 _tSpringer Nature eBook
776 0 8 _iPrinted edition:
_z9783030575915
776 0 8 _iPrinted edition:
_z9783030575939
830 0 _aData-Centric Systems and Applications,
_x2197-974X
856 4 0 _uhttps://doi.org/10.1007/978-3-030-57592-2
912 _aZDB-2-SCS
912 _aZDB-2-SXCS
942 _cSPRINGER
999 _c174517
_d174517