Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] parseStream() does not work as expected when there is invalid CSV rows #1093

Open
1 task
ajay-psd opened this issue Feb 6, 2025 · 0 comments
Open
1 task
Assignees
Labels

Comments

@ajay-psd
Copy link

ajay-psd commented Feb 6, 2025

Describe the bug
Parsing a CSV from s3 bucket data file containing an invalid row, parsing immediately stopped when the stream(data chunks ) contained an invalid row. the expected behavior is to continue parsing upon encountering an invalid-data

sample invalid data entry in the file
`"xxhTT5lwV","[email protected]","919582162103","62103"22","621","Samsung"

Parsing works with data files that do not have invalid data, But it fails with invalid data.

for handling invalid data it is not adhering to Options passed as
{ header: true, strictColumnHandling: true, discardUnmappedColumns: true}

Parsing or Formatting?

  • Parsing

The code snippet that I have used.

   const filename = 'uploads/FYAP6BA7/my_file-1738837410345-372923559.csv';
    const bucket = 'test-app'
   var params = {
        Bucket: bucket,
        Key: filename
   };
  const stream = s3.getObject(params).createReadStream()
csv.parseStream(stream, { header: true, strictColumnHandling: true, discardUnmappedColumns: true})
   .on('error', error => console.error(error))
   .on('data', row => console.log(JSON.stringify(row)))
   .on('end', rowCount => console.log(rowCount}))
   .on('data-invalid', row => console.log(JSON.stringify(row)}))

sample data in a s3 file

"name","email","sms","token","code","label"
"xxhTT5lwV","[email protected]","919582162103","62103"22","621","Samsung"
"0ENexk5cj","[email protected]","919194880301","621034"444","803","Testing"
"NOG1TQ8Cz","[email protected]","919619960375","621035555","603","Sony"
"gcscciBK8","[email protected]","919037631672","621036666","316","LG"

first and second row has invalid data with extra quote

Parsing works with data files that do not have invalid data, But it fails with invalid data.

Expected behavior
the expected behavior is to continue parsing upon encountering invalid data, as per the documentation invalid data should handled ina data-invalid event and parsing will continue.

Image

Screenshots

Image

Desktop (please complete the following information):

  • OS: MacOS
  • OS Version: Sequoia 15.2
  • Node Version v22.12.0
  • fast-csv version : "^5.0.0" and ^5.0.2

Is any anything I am doing wrong or missing something in the code?

@ajay-psd ajay-psd added the bug label Feb 6, 2025
@ajay-psd ajay-psd changed the title [BUG] parseStream() does not work as expected when the is invalid rows [BUG] parseStream() does not work as expected when there is invalid CSV rows Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants