Stories about product, engineering, leadership, business, strategy, work organization, and more.

Featured Posts:

Written with ❀️ by @duponico 🐦️

Manipulate CSV Files in PHP

CSV for Comma-Separated Values is a well established and common file format to import and export data. Let's learn how to read and write CSV files in a modern and efficient way in PHP.

The Old-Fashioned Way

PHP contains native functions to read CSV content from a file pointer.

Read CSV rows from a file using `fgetcsv`:

CODE_BLOCK_LANGUAGE="php" $path = 'data/movies-100.csv'; if (($handle = fopen($path, "r")) !== false) { while (($data = fgetcsv($handle, 1000, ",")) !== false) { var_dump($data); } fclose($handle); }

Write CSV rows into a file using `fputcsv`:

CODE_BLOCK_LANGUAGE="php" $rows = [ ['id', 'title', 'poster', 'overview', 'release_date', 'genres'], [181808, "Star Wars: The Last Jedi", "https://image.tmdb.org/t/p/w500/kOVEVeg59E0wsnXmF9nrh6OmWII.jpg", "Rey develops her newly discovered abilities with the guidance of Luke Skywalker, who is unsettled by the strength of her powers. Meanwhile, the Resistance prepares to do battle with the First Order.", 1513123200, "Documentary"], [383498, "Deadpool 2", "https://image.tmdb.org/t/p/w500/to0spRl1CMDvyUbOnbb4fTk3VAd.jpg", "Wisecracking mercenary Deadpool battles the evil and powerful Cable and other bad guys to save a boy's life.", 1526346000, "Action, Comedy, Adventure"], [157336, "Interstellar", "https://image.tmdb.org/t/p/w500/gEU2QniE6E77NI6lCU6MxlNBvIx.jpg", "Interstellar chronicles the adventures of a group of explorers who make use of a newly discovered wormhole to surpass the limitations on human space travel and conquer the vast distances involved in an interstellar voyage.",1415145600,"Adventure, Drama, Science Fiction"] ]; $path = 'data/new-file.csv'; $fp = fopen($path, 'w'); foreach ($rows as $row) { fputcsv($fp, $row); } fclose($fp);

This approach is totally fine, but some modern libraries can make our life easier and our code more readable.

The Modern Way

Let's install and use the excellent league/csv library, which provides a clean and straightforward CSV manipulation API.

CODE_BLOCK_LANGUAGE="shell" composer require league/csv

Read CSV rows from a file using the`Reader`:

CODE_BLOCK_LANGUAGE="php" use League\Csv\Reader; $path = 'data/movies-100.csv'; $csv = Reader::createFromPath($path, 'r'); $csv->setHeaderOffset(0); // use the first line as headers for rows $header = $csv->getHeader(); var_dump($header); $rows = $csv->getRecords(); foreach ($rows as $row) { var_dump($row); }

Write CSV rows into a file using the`Writer`:

CODE_BLOCK_LANGUAGE="php" use League\Csv\Writer; $rows = [ ['id', 'title', 'poster', 'overview', 'release_date', 'genres'], [181808, "Star Wars: The Last Jedi", "https://image.tmdb.org/t/p/w500/kOVEVeg59E0wsnXmF9nrh6OmWII.jpg", "Rey develops her newly discovered abilities with the guidance of Luke Skywalker, who is unsettled by the strength of her powers. Meanwhile, the Resistance prepares to do battle with the First Order.", 1513123200, "Documentary"], [383498, "Deadpool 2", "https://image.tmdb.org/t/p/w500/to0spRl1CMDvyUbOnbb4fTk3VAd.jpg", "Wisecracking mercenary Deadpool battles the evil and powerful Cable and other bad guys to save a boy's life.", 1526346000, "Action, Comedy, Adventure"], [157336, "Interstellar", "https://image.tmdb.org/t/p/w500/gEU2QniE6E77NI6lCU6MxlNBvIx.jpg", "Interstellar chronicles the adventures of a group of explorers who make use of a newly discovered wormhole to surpass the limitations on human space travel and conquer the vast distances involved in an interstellar voyage.",1415145600,"Adventure, Drama, Science Fiction"] ]; $path = 'data/new-file2.csv'; $csv = Writer::createFromPath($path, 'w'); $csv->insertAll($rows);

Big Files and Memory Usage

The trick to parse big CSV files by keeping a low memory usage is toΒ never load all data in memory.

Luckily, both methods we explored allow us to iterate and stream the content.

Let's take an example of a file containing 1M of lines; this file weights almost 400MB.Β 

CODE_BLOCK_LANGUAGE="shell" -rw-r--r-- 1 nico nico 393M may 13 12:38 data/movies-1000000.csv

Let's load all its content:

CODE_BLOCK_LANGUAGE="php" // I'm using here the Symfony Console & StopWatch components $section = 'read_csv_file'; $this->stopwatch->start($section); $path = 'data/movies-1000000.csv'; $csv = Reader::createFromPath($path, 'r'); $csv->setHeaderOffset(0); $rows = $csv->getRecords(); foreach ($rows as $row) { // we do nothing, but we want to ensure we browse each row } $this->stopwatch->stop($section); $output->writeln("I read ".$csv->count()." rows from the CSV File ".$path); $output->writeln((string) $this->stopwatch->getEvent($section));

And here is the result:

CODE_BLOCK_LANGUAGE="shell" I read 1000000 rows from the CSV File data/movies-1000000.csv default/read_csv_file: 6.00 MiB - 26699 ms

We load our 1M lines using only 6MB of memory (I'm using PHP8).

By streaming properly the data reading and the processing you want to apply, you can keep the memory usage very low.

Extra Resources

You can find all the code and examples in this GitHub repository.

It's packaged as a simple Symfony project, a set of commands, it also comes with a Docker image. πŸ‹