Symfony

Clear your Doctrine Collections

Have you ever come across some unexpected behavior while developing with your favorite tools? Or better said: while developing with the tools you most often use and think you have a good in-depth knowledge of how these work?

Well, that’s the beauty in software development. No matter how many years of experience you have or how many projects you finish, every now and then you hit something that catches you off-guard and can make a fool of you.

Last week I was building some functionality for a friend of mine. He has a physical store with goods, and sometimes the balance of a certain day must be recalculated; that means merchandise with enter/exit prices, cash, debts, and payments. The requirement was to recalculate the balance for a period of time starting from a certain day. So the problem I encountered was the following: everything was calculated correctly when executing the code for a single day, but when I was executing for more than one day inside a loop, the calculations were wrong, even though I was grouping/separating the array by dates.

Bi-directional One-To-Many Relationship

Instead of continuing with the real-world example, imagine for a moment the following scenario: a blog with multiple authors. Each author has many posts that can be published at different dates. So let’s say you want to show a report by day with the authors and the total number of views for their posts. It could have value for you or the person running the blog or for your authors. Here’s the class diagram of two simple Doctrine entities:

Using the command bin/console make:entity you can set up quickly these entities:

How would you query the database for the data? Here’s how I did it: inside a service class called AuthorManager, for yesterday and today, inside a loop, I used a custom query inside AuthorRepository which returns the post of an author for a specific date:

// src/AuthorManager.php
namespace App;

use App\Repository\AuthorRepository;
use Doctrine\ORM\EntityManagerInterface;
use Doctrine\Persistence\ManagerRegistry;

class AuthorManager
{
    /** @var AuthorRepository */
    private $repository;

    public function __construct(AuthorRepository $repository)
    {
        $this->repository = $repository;
    }

    public function getAuthors(): array
    {
        $viewsByDay = [];

        $dates = [
            new \DateTime('yesterday'),
            new \DateTime()
        ];

        foreach($dates as $date) {
            $viewsByDay[$date->format('Y-m-d')] = $this->repository->findByDay($date);
        }

        return $viewsByDay;
    }
}
// src/Repository/AuthorRepository.php
/**
* @return Author[] array
*/
public function findByDay(\DateTimeInterface $date): array
{
    return $this->createQueryBuilder('a')
        ->select('a, p')
        ->join('a.blogPosts', 'p')
        ->andWhere('p.published_at = :date')
        ->setParameter('date', $date->format('Y-m-d'))
        ->orderBy('p.published_at')
        ->getQuery()
        ->getResult();
}

Supposing you have a single author with 2 blog posts, you should see something like this, right?

Wrong. This is what you actually get:

Why? What do you mean: Why? It's magic. Instead of getting the Blog post 2 which is for that date, you get the first one. Long story short: it has to do with how Doctrine hydrates your entities with the data from the database. What I supposed it was doing was initializing a new ArrayCollection for each author and adding the blogposts from that specific date. But it doesn’t do it like this. It looks like it’s a bug in Doctrine, specifically the ObjectHydrator, because if you change the hydration mode to AbstractQuery::HYDRATE_ARRAY then you’ll get the data correctly.

If you still want to hydrate as objects and use the method getTotalViews for Author, inside the foreach loop you should call clear() method for the repository: $this→repository→clear();

Here’s a quick debug session using Xdebug, without calling clear:

Another solution is not using a bi-directional relationship. Remove blogPosts from the Author entity and create a method inside BlogPostRepository. This way you will get an array of the BlogPost entities filtered for by a specific author and date.

In my real-world scenario I was referring to at the beginning of the post, I was grouping the merchandise by provider for each day. So instead of querying directly the MerchandiseRepository, I was accessing the merchandise through the Provider, like this: $provider→getMerchandise(). After the first loop, I was starting to get incorrect calculations because the merchandise was not hydrated correctly for every day for each provider; it started to accumulate and while debugging, I noticed merchandise with a different date from the day I was grouping by.

Testing the code

Here’s a test using Foundry to set up the database, instead of the Doctrine Fixtures. Foundry is a factory library for creating fixtures created by Kevin Bond (@zenstruck), heavily inspired by Laravel Factories (see credit).
// tests/AuthorManagerTest.php
namespace App\Tests;

use App\AuthorManager;
use App\Factory\AuthorFactory;
use App\Factory\BlogPostFactory;
use App\Repository\BlogPostRepository;
use Symfony\Bundle\FrameworkBundle\Test\KernelTestCase;
use Zenstruck\Foundry\Test\Factories;
use Zenstruck\Foundry\Test\ResetDatabase;

class AuthorManagerTest extends KernelTestCase
{
    use ResetDatabase, Factories;

    /** @var AuthorManager|object|null */
    private $manager;

    public function setUp()
    {
        self::bootKernel();

        $this->manager =  self::$container->get(AuthorManager::class);
    }

    /**
     * @test
     */
    public function test_views_by_date_foreach_author()
    {
        $author = AuthorFactory::new()->create(['name' => 'Sergio']);
        $yesterday = new \DateTime('yesterday');
        $today = new \DateTime();

        BlogPostFactory::new()->create([
            'author' => $author,
            'publishedAt' => $yesterday,
            'views' => 100
        ]);

        BlogPostFactory::new()->create([
            'author' => $author,
            'publishedAt' => $today,
            'views' => 200
        ]);

        $author->refresh();

        $viewsPerDay = $this->manager->getAuthors();
        $yesterdayViews = $viewsPerDay[$yesterday->format('Y-m-d')];
        $todayViews = $viewsPerDay[$today->format('Y-m-d')];

        // Check yesterday's views are 100 for the only author
        static::assertSame(100, $yesterdayViews[0]->totalViews());

        // Check today's views are 200 for the only author
        static::assertSame(200, $todayViews[0]->totalViews());
    }
}

If you want to stick with HYDRATE_ARRAY instead of working with objects, this test is working. However, if you want to assert against hydrated objects, even if you cleared the collection, the test will fail:

Why? Because in this case the author has both blog posts in that ArrayCollection, and you will get a total of 300 views. The solution?

$author->refresh();

self::$container->get('doctrine')->reset();

$viewsPerDay = $this->manager->getAuthors();

This will force Doctrine to hydrate correctly the blogposts and now the test will pass.

Here’s a GitHub repository with the code. The fixes are commented inside the test and AuthorManager.

A little confession

I got so used to Doctrine, that, at a certain point, without any real prof, I started to disregard other ORM tools. To be honest, I’ve always felt more comfortable with Unit of Work implementation (where you have repositories and through an Entity Manager you persist/flush data to the database), in comparison to other implementations like Active Record. I remember back in the glory days of CodeIgniter that it felt weird from the very beginning to query directly over a model.

Besides Symfony, another popular framework is Laravel, which has Eloquent for database access. I’m pretty sure that in this situation I would have not encountered into this problem, because it doesn’t have so many components under the hood like Doctrine, which keeps track of every change.

Thinking that Doctrine it’s a better way to use a database was quite clear for me until now. I’ve been using it since Symfony 2. Even though this case it’s not some special edge case, I run into unexpected behavior. I’ve read and heard from other developers that sometimes, especially with bi-directional relationships, Doctrine is a pain in the ass to work with (think about Many-To-Many relationship with some dynamic javascript forms).

Besides that, I feature envy Laravel for using Collections to return the data, which sometimes makes working with it a pleasure (see Adam Wathan’s Refactoring to Collections). I would like to do some performance testing using Blackfire for Eloquent and Doctrine since I plan using Laravel for future projects.

Here are some links if you want to read more about this problem:

Thanks for reading and if you found it useful, you can share it.