This is the code repository for The Definitive Guide to Data Integration, published by Packt.
Unlock the power of data integration to efficiently manage, transform, and analyze data
The Definitive Guide to Data Integration is an indispensable resource for navigating the complexities of modern data integration. Focusing on the latest tools, techniques, and best practices, this guide helps you master data integration and unleash the full potential of your data.
This book covers the following exciting features:
- Discover the evolving architecture and technologies shaping data integration
- Process large data volumes efficiently with data warehousing
- Tackle the complexities of integrating large datasets from diverse sources
- Harness the power of data warehousing for efficient data storage and processing
- Design and optimize effective data integration solutions
- Explore data governance principles and compliance requirements
If you feel this book is for you, get your copy today!
The code will look like the following:
# Filter employees with salary greater than $50,00
filtered_employees_df = employees_df.filter(employees_df.salary > 50000)
Following is what you need for this book: This book is perfect for data engineers, data architects, data analysts, and IT professionals looking to gain a comprehensive understanding of data integration in the modern era. Whether you’re a beginner or an experienced professional enhancing your knowledge of the modern data stack, this definitive guide will help you navigate the data integration landscape.
Following are the software and hardware list present in the book (Chapter 1-16).
Chapter | Software required | OS required |
---|---|---|
1-16 | SQL and data transformation | Windows, macOS, or Linux |
1-16 | Massively parallel processing systems | Windows, macOS, or Linux |
1-16 | Spark for data transformation | Windows, macOS, or Linux |
1-16 | Data storage technologies (data warehouses, data lakes, | Windows, macOS, or Linux |
1-16 | Data modeling techniques | Windows, macOS, or Linux |
1-16 | Data integration models (ETL and ELT) | Windows, macOS, or Linux |
1-16 | Data exposition technologies (Streams, REST APIs, | Windows, macOS, or Linux |
Pierre-Yves Bonnefoy is a versatile data and cloud architect boasting over 20 years of experience across diverse technical and functional domains. With an extensive background in software development, systems and networks, data analytics, and data science, Pierre-Yves offers a comprehensive view of information systems. As the CEO of Olexya and CTO of Africa4Data, he dedicates his effort to delivering cutting-edge solutions for clients and promoting data-driven decision-making. As an active board member of French Tech Le Mans, Pierre-Yves enthusiastically supports the local tech ecosystem, fostering entrepreneurship and innovation while sharing his expertise with the next generation of tech leaders. You can contact him at [email protected].
Emeric Chaize with over 16 years of experience in data management and cloud technology, demonstrates a profound knowledge of data platforms and their architecture, further exemplified by his role as president of Olexya, a data architecture company. His background in computer science and engineering, combined with hands-on experience, has honed his skills in understanding complex data architectures and implementing efficient data integration solutions. His work at various small and large companies has demonstrated his proficiency in implementing cloud-based data platforms and overseeing data-driven projects, making him highly suited for roles involving data platforms and data integration challenges. You can contact him at [email protected].
Raphaël Mansuy is a seasoned technology executive and entrepreneur with over 25 years of experience in software development, data engineering, and AI-driven solutions. As a founder of several companies, he has demonstrated success in designing and implementing mission-critical solutions for global enterprises, creating innovative technologies, and fostering business growth. Raphaël is highly skilled in AI, data engineering, DevOps, and cloud-native development, offering consultancy services to Fortune 500 companies and start-ups alike. He is passionate about enabling businesses to thrive using cutting-edge technologies and insights. You can contact him at [email protected].
Mehdi TAZI is a data and cloud architect with over 12 years of experience and the CEO of an IT consulting and investment company. He specializes in distributed information systems and data architecture. He navigates through both platform and application facets. Mehdi designs information systems architectures that answer customers’ needs by setting up technical, functional, and organizational solutions, as well as designing and coding in languages such as Java, Scala, or Python. You can contact him at [email protected]/tazimehdi.com.