|
| 1 | + |
| 2 | +# CONTRIBUTION_PLAN.md |
| 3 | + |
| 4 | +## 1. Basic Information |
| 5 | + |
| 6 | +- **Project Name:** pandas |
| 7 | +- **GitHub URL:** [https://github.com/pandas-dev/pandas](https://github.com/pandas-dev/pandas) |
| 8 | +- **Primary Language(s):** Python, with some Cython |
| 9 | +- **Project Purpose:** |
| 10 | + Pandas is a fast, powerful, flexible, and easy-to-use open-source data analysis and manipulation library built on top of Python. It provides data structures like `Series` (1D) and `DataFrame` (2D) that simplify working with structured data. Pandas is widely used in data science, analytics, machine learning pipelines, and scientific computing. |
| 11 | + |
| 12 | +--- |
| 13 | + |
| 14 | +## 2. Contribution Guidelines |
| 15 | + |
| 16 | +- **Is there a CONTRIBUTING.md file?** |
| 17 | + No. The project uses a `contributing.rst` file instead of `CONTRIBUTING.md`. This file outlines detailed instructions for contributing to pandas, including: |
| 18 | + - Types of contributions accepted: bug fixes, documentation, enhancements, and suggestions. |
| 19 | + - Instructions to pick issues labeled "good first issue" or "Docs". |
| 20 | + - Version control workflow: Fork → Clone → Create Branch → Make Changes → Pull Request (PR). |
| 21 | + - Environment setup with conda and regular syncing with the main branch. |
| 22 | + - Guidelines for writing good commit messages, referencing issues, and ensuring tests pass. |
| 23 | + |
| 24 | +- **Is there a Code of Conduct?** |
| 25 | + Yes. The pandas project follows a [Code of Conduct](https://github.com/pandas-dev/pandas/blob/main/CODE_OF_CONDUCT.md) to ensure a harassment-free, inclusive, and welcoming environment for everyone. |
| 26 | + |
| 27 | +- **Is a CLA (Contributor License Agreement) needed?** |
| 28 | + ❌ No. Contributors are expected to comply with the open-source license (BSD 3-Clause) and follow the project's contribution and conduct guidelines. |
| 29 | + |
| 30 | +- **Are first-time contributors welcomed?** |
| 31 | + ✅ Yes. The project actively encourages contributions from first-timers and provides clear instructions for onboarding. |
| 32 | + |
| 33 | +--- |
| 34 | + |
| 35 | +## 3. Environment Setup |
| 36 | + |
| 37 | +### Steps to Set Up Locally: |
| 38 | +1. Fork the repository on GitHub to your account. |
| 39 | +2. Clone the repository locally: |
| 40 | + ```bash |
| 41 | + git clone https://github.com/<your-username>/pandas.git |
| 42 | + cd pandas |
| 43 | + ``` |
| 44 | +3. Create and activate a development environment using conda: |
| 45 | + ```bash |
| 46 | + conda create -n devenv python=3.10 |
| 47 | + conda activate devenv |
| 48 | + ``` |
| 49 | +4. Install development dependencies: |
| 50 | + ```bash |
| 51 | + pip install -r requirements-dev.txt |
| 52 | + ``` |
| 53 | +5. Build the C extensions required by pandas: |
| 54 | + ```bash |
| 55 | + python setup.py build_ext --inplace |
| 56 | + ``` |
| 57 | +6. (Optional but recommended) Run the test suite to validate your environment: |
| 58 | + ```bash |
| 59 | + pytest pandas/tests/ |
| 60 | + ``` |
| 61 | + |
| 62 | +--- |
| 63 | + |
| 64 | +## 4. Making a Contribution |
| 65 | + |
| 66 | +- **Open Issue Chosen:** |
| 67 | + [BUG: Implicit conversion to float64 with isin() #61676](https://github.com/pandas-dev/pandas/issues/61676) |
| 68 | + |
| 69 | +- **Issue Summary:** |
| 70 | + Using `isin()` on a DataFrame column and passing a value of type `np.uint64` causes unexpected implicit conversion to `float64`, resulting in incorrect behavior. |
| 71 | + |
| 72 | +### Steps to Resolve the Issue: |
| 73 | +1. Reproduce the issue locally by writing a minimal test case. |
| 74 | +2. Implement a fix by ensuring consistent data types between the DataFrame column and the values passed to `isin()`: |
| 75 | + - **Approach 1:** Convert the `isin()` argument to match the column's dtype. |
| 76 | + - **Approach 2:** Convert the DataFrame column to `uint64` if needed. |
| 77 | +3. Add appropriate unit tests under `pandas/tests/`. |
| 78 | +4. Ensure all existing and new tests pass using `pytest`. |
| 79 | + |
| 80 | +--- |
| 81 | + |
| 82 | +## 5. Create a Pull Request Plan |
| 83 | + |
| 84 | +### Pull Request Workflow: |
| 85 | +1. Create a new feature branch: |
| 86 | + ```bash |
| 87 | + git checkout -b fix-isin-float64-conversion |
| 88 | + ``` |
| 89 | +2. Make the required code changes in the appropriate files. |
| 90 | +3. Add and commit the changes: |
| 91 | + ```bash |
| 92 | + git add . |
| 93 | + git commit -m "BUG: Fix implicit float64 conversion in isin (#61676)" |
| 94 | + ``` |
| 95 | +4. Push the changes to your fork: |
| 96 | + ```bash |
| 97 | + git push origin fix-isin-float64-conversion |
| 98 | + ``` |
| 99 | +5. Open a Pull Request in GitHub from your branch to `pandas-dev/pandas:main`. |
| 100 | + |
| 101 | +### Example PR Title: |
| 102 | +``` |
| 103 | +BUG: Fix implicit float64 conversion in isin (#61676) |
| 104 | +``` |
| 105 | + |
| 106 | +### PR Description: |
| 107 | +``` |
| 108 | +This PR addresses issue #61676 where the use of np.uint64 with isin() leads to implicit float64 conversion, causing incorrect behavior. The fix ensures type consistency by either casting the value to match the column type or converting the column when appropriate. Additional unit tests are included to verify the fix. |
| 109 | +``` |
| 110 | + |
| 111 | +### Testing the Fix: |
| 112 | +- Run the test suite using `pytest pandas/tests/`. |
| 113 | +- Confirm that all tests pass and that the new tests cover the bug scenario. |
0 commit comments