Tue Jan 23 23:50:49 UTC 2024 I: starting to build python-streamz/bookworm/i386 on jenkins on '2024-01-23 23:50' Tue Jan 23 23:50:49 UTC 2024 I: The jenkins build log is/was available at https://jenkins.debian.net/userContent/reproducible/debian/build_service/i386_1/14048/console.log Tue Jan 23 23:50:49 UTC 2024 I: Downloading source for bookworm/python-streamz=0.6.4-1 --2024-01-23 23:50:49-- http://cdn-fastly.deb.debian.org/debian/pool/main/p/python-streamz/python-streamz_0.6.4-1.dsc Connecting to 78.137.99.97:3128... connected. Proxy request sent, awaiting response... 200 OK Length: 1701 (1.7K) [text/prs.lines.tag] Saving to: ‘python-streamz_0.6.4-1.dsc’ 0K . 100% 252M=0s 2024-01-23 23:50:49 (252 MB/s) - ‘python-streamz_0.6.4-1.dsc’ saved [1701/1701] Tue Jan 23 23:50:49 UTC 2024 I: python-streamz_0.6.4-1.dsc -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Format: 3.0 (quilt) Source: python-streamz Binary: python3-streamz Architecture: all Version: 0.6.4-1 Maintainer: Debian Python Team Uploaders: Nilesh Patra Homepage: https://github.com/python-streamz/streamz/ Standards-Version: 4.6.2 Vcs-Browser: https://salsa.debian.org/python-team/packages/python-streamz Vcs-Git: https://salsa.debian.org/python-team/packages/python-streamz.git Testsuite: autopkgtest, autopkgtest-pkg-python Testsuite-Triggers: python3, python3-pytest Build-Depends: debhelper-compat (= 13), dh-python, python3-all, python3-setuptools, python3-six, python3-toolz, python3-tornado, python3-pytest, python3-requests, python3-dask, python3-distributed, python3-numpy, python3-pandas, python3-flaky , debhelper Package-List: python3-streamz deb python optional arch=all Checksums-Sha1: ef99c9c9556bf29da38d557cc2a9d2d0d73a9e2b 135418 python-streamz_0.6.4.orig.tar.gz 4a66b3ece9f18a565d7e39aa0523ea2ba7cd53e9 5756 python-streamz_0.6.4-1.debian.tar.xz Checksums-Sha256: 2514396fe5d616e3d9ed57a06280d1c17a990f722e62a3f560c6ba0932dcf9f2 135418 python-streamz_0.6.4.orig.tar.gz 83421407eeafa3c1424b97fcce278aa95a68c97bfc434792474ab58536953ee9 5756 python-streamz_0.6.4-1.debian.tar.xz Files: 92b19c5b15b55608aec2bffdf12c7ea3 135418 python-streamz_0.6.4.orig.tar.gz 154ecc733ea237624d948cf217ce045f 5756 python-streamz_0.6.4-1.debian.tar.xz -----BEGIN PGP SIGNATURE----- iHUEARYIAB0WIQSglbZu4JAkvuai8HIqJ5BL1yQ+2gUCY+5/1AAKCRAqJ5BL1yQ+ 2vmiAQDyaPhQsDIm/MwMlak0TUHjJntVXdfExuLJGTYpyh0NoAD/UDeXewt6JB1N 029uxQykaETy0oJSmKwq0KDdlS7Oews= =bKMw -----END PGP SIGNATURE----- Tue Jan 23 23:50:49 UTC 2024 I: Checking whether the package is not for us Tue Jan 23 23:50:49 UTC 2024 I: Starting 1st build on remote node ionos2-i386.debian.net. Tue Jan 23 23:50:49 UTC 2024 I: Preparing to do remote build '1' on ionos2-i386.debian.net. Tue Jan 23 23:56:05 UTC 2024 I: Deleting $TMPDIR on ionos2-i386.debian.net. I: pbuilder: network access will be disabled during build I: Current time: Tue Jan 23 11:50:53 -12 2024 I: pbuilder-time-stamp: 1706053853 I: Building the build Environment I: extracting base tarball [/var/cache/pbuilder/bookworm-reproducible-base.tgz] I: copying local configuration W: --override-config is not set; not updating apt.conf Read the manpage for details. I: mounting /proc filesystem I: mounting /sys filesystem I: creating /{dev,run}/shm I: mounting /dev/pts filesystem I: redirecting /dev/ptmx to /dev/pts/ptmx I: policy-rc.d already exists I: using eatmydata during job I: Copying source file I: copying [python-streamz_0.6.4-1.dsc] I: copying [./python-streamz_0.6.4.orig.tar.gz] I: copying [./python-streamz_0.6.4-1.debian.tar.xz] I: Extracting source gpgv: Signature made Thu Feb 16 19:11:16 2023 gpgv: using EDDSA key A095B66EE09024BEE6A2F0722A27904BD7243EDA gpgv: Can't check signature: No public key dpkg-source: warning: cannot verify inline signature for ./python-streamz_0.6.4-1.dsc: no acceptable signature found dpkg-source: info: extracting python-streamz in python-streamz-0.6.4 dpkg-source: info: unpacking python-streamz_0.6.4.orig.tar.gz dpkg-source: info: unpacking python-streamz_0.6.4-1.debian.tar.xz dpkg-source: info: using patch list from debian/patches/series dpkg-source: info: applying disable-unsupported-tests.patch dpkg-source: info: applying ci-fixes.patch I: Not using root during the build. I: Installing the build-deps I: user script /srv/workspace/pbuilder/7484/tmp/hooks/D02_print_environment starting I: set BUILDDIR='/build/reproducible-path' BUILDUSERGECOS='first user,first room,first work-phone,first home-phone,first other' BUILDUSERNAME='pbuilder1' BUILD_ARCH='i386' DEBIAN_FRONTEND='noninteractive' DEB_BUILD_OPTIONS='buildinfo=+all reproducible=+all parallel=8 ' DISTRIBUTION='bookworm' HOME='/root' HOST_ARCH='i386' IFS=' ' INVOCATION_ID='374c878059e846e68ae2fecdb5c5bab5' LANG='C' LANGUAGE='en_US:en' LC_ALL='C' LD_LIBRARY_PATH='/usr/lib/libeatmydata' LD_PRELOAD='libeatmydata.so' MAIL='/var/mail/root' OPTIND='1' PATH='/usr/sbin:/usr/bin:/sbin:/bin:/usr/games' PBCURRENTCOMMANDLINEOPERATION='build' PBUILDER_OPERATION='build' PBUILDER_PKGDATADIR='/usr/share/pbuilder' PBUILDER_PKGLIBDIR='/usr/lib/pbuilder' PBUILDER_SYSCONFDIR='/etc' PPID='7484' PS1='# ' PS2='> ' PS4='+ ' PWD='/' SHELL='/bin/bash' SHLVL='2' SUDO_COMMAND='/usr/bin/timeout -k 18.1h 18h /usr/bin/ionice -c 3 /usr/bin/nice /usr/sbin/pbuilder --build --configfile /srv/reproducible-results/rbuild-debian/r-b-build.HEwUV97q/pbuilderrc_UIow --distribution bookworm --hookdir /etc/pbuilder/first-build-hooks --debbuildopts -b --basetgz /var/cache/pbuilder/bookworm-reproducible-base.tgz --buildresult /srv/reproducible-results/rbuild-debian/r-b-build.HEwUV97q/b1 --logfile b1/build.log python-streamz_0.6.4-1.dsc' SUDO_GID='112' SUDO_UID='107' SUDO_USER='jenkins' TERM='unknown' TZ='/usr/share/zoneinfo/Etc/GMT+12' USER='root' _='/usr/bin/systemd-run' http_proxy='http://78.137.99.97:3128' I: uname -a Linux ionos2-i386 6.1.0-17-686-pae #1 SMP PREEMPT_DYNAMIC Debian 6.1.69-1 (2023-12-30) i686 GNU/Linux I: ls -l /bin total 6036 -rwxr-xr-x 1 root root 1408088 Apr 23 2023 bash -rwxr-xr-x 3 root root 38404 Sep 19 2022 bunzip2 -rwxr-xr-x 3 root root 38404 Sep 19 2022 bzcat lrwxrwxrwx 1 root root 6 Sep 19 2022 bzcmp -> bzdiff -rwxr-xr-x 1 root root 2225 Sep 19 2022 bzdiff lrwxrwxrwx 1 root root 6 Sep 19 2022 bzegrep -> bzgrep -rwxr-xr-x 1 root root 4893 Nov 27 2021 bzexe lrwxrwxrwx 1 root root 6 Sep 19 2022 bzfgrep -> bzgrep -rwxr-xr-x 1 root root 3775 Sep 19 2022 bzgrep -rwxr-xr-x 3 root root 38404 Sep 19 2022 bzip2 -rwxr-xr-x 1 root root 17892 Sep 19 2022 bzip2recover lrwxrwxrwx 1 root root 6 Sep 19 2022 bzless -> bzmore -rwxr-xr-x 1 root root 1297 Sep 19 2022 bzmore -rwxr-xr-x 1 root root 42920 Sep 20 2022 cat -rwxr-xr-x 1 root root 79816 Sep 20 2022 chgrp -rwxr-xr-x 1 root root 67496 Sep 20 2022 chmod -rwxr-xr-x 1 root root 79816 Sep 20 2022 chown -rwxr-xr-x 1 root root 162024 Sep 20 2022 cp -rwxr-xr-x 1 root root 136916 Jan 5 2023 dash -rwxr-xr-x 1 root root 137160 Sep 20 2022 date -rwxr-xr-x 1 root root 100364 Sep 20 2022 dd -rwxr-xr-x 1 root root 108940 Sep 20 2022 df -rwxr-xr-x 1 root root 162152 Sep 20 2022 dir -rwxr-xr-x 1 root root 87760 Mar 23 2023 dmesg lrwxrwxrwx 1 root root 8 Dec 19 2022 dnsdomainname -> hostname lrwxrwxrwx 1 root root 8 Dec 19 2022 domainname -> hostname -rwxr-xr-x 1 root root 38760 Sep 20 2022 echo -rwxr-xr-x 1 root root 41 Jan 24 2023 egrep -rwxr-xr-x 1 root root 34664 Sep 20 2022 false -rwxr-xr-x 1 root root 41 Jan 24 2023 fgrep -rwxr-xr-x 1 root root 84272 Mar 23 2023 findmnt -rwsr-xr-x 1 root root 30240 Mar 23 2023 fusermount -rwxr-xr-x 1 root root 218680 Jan 24 2023 grep -rwxr-xr-x 2 root root 2346 Apr 10 2022 gunzip -rwxr-xr-x 1 root root 6447 Apr 10 2022 gzexe -rwxr-xr-x 1 root root 100952 Apr 10 2022 gzip -rwxr-xr-x 1 root root 21916 Dec 19 2022 hostname -rwxr-xr-x 1 root root 75756 Sep 20 2022 ln -rwxr-xr-x 1 root root 55600 Mar 23 2023 login -rwxr-xr-x 1 root root 162152 Sep 20 2022 ls -rwxr-xr-x 1 root root 214568 Mar 23 2023 lsblk -rwxr-xr-x 1 root root 96328 Sep 20 2022 mkdir -rwxr-xr-x 1 root root 84008 Sep 20 2022 mknod -rwxr-xr-x 1 root root 38792 Sep 20 2022 mktemp -rwxr-xr-x 1 root root 63016 Mar 23 2023 more -rwsr-xr-x 1 root root 58912 Mar 23 2023 mount -rwxr-xr-x 1 root root 13856 Mar 23 2023 mountpoint -rwxr-xr-x 1 root root 157932 Sep 20 2022 mv lrwxrwxrwx 1 root root 8 Dec 19 2022 nisdomainname -> hostname lrwxrwxrwx 1 root root 14 Apr 3 2023 pidof -> /sbin/killall5 -rwxr-xr-x 1 root root 38792 Sep 20 2022 pwd lrwxrwxrwx 1 root root 4 Apr 23 2023 rbash -> bash -rwxr-xr-x 1 root root 51080 Sep 20 2022 readlink -rwxr-xr-x 1 root root 75720 Sep 20 2022 rm -rwxr-xr-x 1 root root 51080 Sep 20 2022 rmdir -rwxr-xr-x 1 root root 22308 Jul 28 23:46 run-parts -rwxr-xr-x 1 root root 133224 Jan 5 2023 sed lrwxrwxrwx 1 root root 4 Jan 5 2023 sh -> dash -rwxr-xr-x 1 root root 38760 Sep 20 2022 sleep -rwxr-xr-x 1 root root 87976 Sep 20 2022 stty -rwsr-xr-x 1 root root 83492 Mar 23 2023 su -rwxr-xr-x 1 root root 38792 Sep 20 2022 sync -rwxr-xr-x 1 root root 598456 Apr 6 2023 tar -rwxr-xr-x 1 root root 13860 Jul 28 23:46 tempfile -rwxr-xr-x 1 root root 120776 Sep 20 2022 touch -rwxr-xr-x 1 root root 34664 Sep 20 2022 true -rwxr-xr-x 1 root root 17892 Mar 23 2023 ulockmgr_server -rwsr-xr-x 1 root root 30236 Mar 23 2023 umount -rwxr-xr-x 1 root root 38760 Sep 20 2022 uname -rwxr-xr-x 2 root root 2346 Apr 10 2022 uncompress -rwxr-xr-x 1 root root 162152 Sep 20 2022 vdir -rwxr-xr-x 1 root root 71216 Mar 23 2023 wdctl lrwxrwxrwx 1 root root 8 Dec 19 2022 ypdomainname -> hostname -rwxr-xr-x 1 root root 1984 Apr 10 2022 zcat -rwxr-xr-x 1 root root 1678 Apr 10 2022 zcmp -rwxr-xr-x 1 root root 6460 Apr 10 2022 zdiff -rwxr-xr-x 1 root root 29 Apr 10 2022 zegrep -rwxr-xr-x 1 root root 29 Apr 10 2022 zfgrep -rwxr-xr-x 1 root root 2081 Apr 10 2022 zforce -rwxr-xr-x 1 root root 8103 Apr 10 2022 zgrep -rwxr-xr-x 1 root root 2206 Apr 10 2022 zless -rwxr-xr-x 1 root root 1842 Apr 10 2022 zmore -rwxr-xr-x 1 root root 4577 Apr 10 2022 znew I: user script /srv/workspace/pbuilder/7484/tmp/hooks/D02_print_environment finished -> Attempting to satisfy build-dependencies -> Creating pbuilder-satisfydepends-dummy package Package: pbuilder-satisfydepends-dummy Version: 0.invalid.0 Architecture: i386 Maintainer: Debian Pbuilder Team Description: Dummy package to satisfy dependencies with aptitude - created by pbuilder This package was created automatically by pbuilder to satisfy the build-dependencies of the package being currently built. Depends: debhelper-compat (= 13), dh-python, python3-all, python3-setuptools, python3-six, python3-toolz, python3-tornado, python3-pytest, python3-requests, python3-dask, python3-distributed, python3-numpy, python3-pandas, python3-flaky, debhelper dpkg-deb: building package 'pbuilder-satisfydepends-dummy' in '/tmp/satisfydepends-aptitude/pbuilder-satisfydepends-dummy.deb'. Selecting previously unselected package pbuilder-satisfydepends-dummy. (Reading database ... 18156 files and directories currently installed.) Preparing to unpack .../pbuilder-satisfydepends-dummy.deb ... Unpacking pbuilder-satisfydepends-dummy (0.invalid.0) ... dpkg: pbuilder-satisfydepends-dummy: dependency problems, but configuring anyway as you requested: pbuilder-satisfydepends-dummy depends on debhelper-compat (= 13); however: Package debhelper-compat is not installed. pbuilder-satisfydepends-dummy depends on dh-python; however: Package dh-python is not installed. pbuilder-satisfydepends-dummy depends on python3-all; however: Package python3-all is not installed. pbuilder-satisfydepends-dummy depends on python3-setuptools; however: Package python3-setuptools is not installed. pbuilder-satisfydepends-dummy depends on python3-six; however: Package python3-six is not installed. pbuilder-satisfydepends-dummy depends on python3-toolz; however: Package python3-toolz is not installed. pbuilder-satisfydepends-dummy depends on python3-tornado; however: Package python3-tornado is not installed. pbuilder-satisfydepends-dummy depends on python3-pytest; however: Package python3-pytest is not installed. pbuilder-satisfydepends-dummy depends on python3-requests; however: Package python3-requests is not installed. pbuilder-satisfydepends-dummy depends on python3-dask; however: Package python3-dask is not installed. pbuilder-satisfydepends-dummy depends on python3-distributed; however: Package python3-distributed is not installed. pbuilder-satisfydepends-dummy depends on python3-numpy; however: Package python3-numpy is not installed. pbuilder-satisfydepends-dummy depends on python3-pandas; however: Package python3-pandas is not installed. pbuilder-satisfydepends-dummy depends on python3-flaky; however: Package python3-flaky is not installed. pbuilder-satisfydepends-dummy depends on debhelper; however: Package debhelper is not installed. Setting up pbuilder-satisfydepends-dummy (0.invalid.0) ... Reading package lists... Building dependency tree... Reading state information... Initializing package states... Writing extended state information... Building tag database... pbuilder-satisfydepends-dummy is already installed at the requested version (0.invalid.0) pbuilder-satisfydepends-dummy is already installed at the requested version (0.invalid.0) The following NEW packages will be installed: autoconf{a} automake{a} autopoint{a} autotools-dev{a} bsdextrautils{a} ca-certificates{a} debhelper{a} dh-autoreconf{a} dh-python{a} dh-strip-nondeterminism{a} dwz{a} file{a} gettext{a} gettext-base{a} groff-base{a} intltool-debian{a} libarchive-zip-perl{a} libblas3{a} libdebhelper-perl{a} libelf1{a} libexpat1{a} libfile-stripnondeterminism-perl{a} libgfortran5{a} libicu72{a} liblapack3{a} libmagic-mgc{a} libmagic1{a} libpipeline1{a} libpython3-stdlib{a} libpython3.11-minimal{a} libpython3.11-stdlib{a} libreadline8{a} libsub-override-perl{a} libtool{a} libuchardet0{a} libxml2{a} libyaml-0-2{a} m4{a} man-db{a} media-types{a} openssl{a} po-debconf{a} python3{a} python3-all{a} python3-attr{a} python3-certifi{a} python3-chardet{a} python3-charset-normalizer{a} python3-click{a} python3-cloudpickle{a} python3-colorama{a} python3-dask{a} python3-dateutil{a} python3-distributed{a} python3-distutils{a} python3-flaky{a} python3-fsspec{a} python3-heapdict{a} python3-idna{a} python3-iniconfig{a} python3-jinja2{a} python3-lib2to3{a} python3-locket{a} python3-markupsafe{a} python3-minimal{a} python3-more-itertools{a} python3-msgpack{a} python3-numpy{a} python3-packaging{a} python3-pandas{a} python3-pandas-lib{a} python3-partd{a} python3-pkg-resources{a} python3-pluggy{a} python3-psutil{a} python3-py{a} python3-pytest{a} python3-requests{a} python3-setuptools{a} python3-six{a} python3-sortedcontainers{a} python3-tblib{a} python3-toolz{a} python3-tornado{a} python3-tz{a} python3-urllib3{a} python3-yaml{a} python3-zict{a} python3.11{a} python3.11-minimal{a} readline-common{a} sensible-utils{a} tzdata{a} The following packages are RECOMMENDED but will NOT be installed: curl git libarchive-cpio-perl libltdl-dev libmail-sendmail-perl lynx python3-babel python3-blosc python3-bottleneck python3-bs4 python3-dropbox python3-fusepy python3-html5lib python3-libarchive-c python3-lxml python3-matplotlib python3-numexpr python3-odf python3-openpyxl python3-paramiko python3-pygit2 python3-pygments python3-scipy python3-tables python3-tqdm python3-zmq wget 0 packages upgraded, 93 newly installed, 0 to remove and 0 not upgraded. Need to get 47.4 MB of archives. After unpacking 211 MB will be used. Writing extended state information... Get: 1 http://deb.debian.org/debian bookworm/main i386 libpython3.11-minimal i386 3.11.2-6 [813 kB] Get: 2 http://deb.debian.org/debian bookworm/main i386 libexpat1 i386 2.5.0-1 [103 kB] Get: 3 http://deb.debian.org/debian bookworm/main i386 python3.11-minimal i386 3.11.2-6 [2130 kB] Get: 4 http://deb.debian.org/debian bookworm/main i386 python3-minimal i386 3.11.2-1+b1 [26.3 kB] Get: 5 http://deb.debian.org/debian bookworm/main i386 media-types all 10.0.0 [26.1 kB] Get: 6 http://deb.debian.org/debian bookworm/main i386 readline-common all 8.2-1.3 [69.0 kB] Get: 7 http://deb.debian.org/debian bookworm/main i386 libreadline8 i386 8.2-1.3 [171 kB] Get: 8 http://deb.debian.org/debian bookworm/main i386 libpython3.11-stdlib i386 3.11.2-6 [1799 kB] Get: 9 http://deb.debian.org/debian bookworm/main i386 python3.11 i386 3.11.2-6 [572 kB] Get: 10 http://deb.debian.org/debian bookworm/main i386 libpython3-stdlib i386 3.11.2-1+b1 [9308 B] Get: 11 http://deb.debian.org/debian bookworm/main i386 python3 i386 3.11.2-1+b1 [26.3 kB] Get: 12 http://deb.debian.org/debian bookworm/main i386 tzdata all 2023c-5+deb12u1 [296 kB] Get: 13 http://deb.debian.org/debian bookworm/main i386 sensible-utils all 0.0.17+nmu1 [19.0 kB] Get: 14 http://deb.debian.org/debian bookworm/main i386 openssl i386 3.0.11-1~deb12u2 [1423 kB] Get: 15 http://deb.debian.org/debian bookworm/main i386 ca-certificates all 20230311 [153 kB] Get: 16 http://deb.debian.org/debian bookworm/main i386 libmagic-mgc i386 1:5.44-3 [305 kB] Get: 17 http://deb.debian.org/debian bookworm/main i386 libmagic1 i386 1:5.44-3 [114 kB] Get: 18 http://deb.debian.org/debian bookworm/main i386 file i386 1:5.44-3 [42.5 kB] Get: 19 http://deb.debian.org/debian bookworm/main i386 gettext-base i386 0.21-12 [162 kB] Get: 20 http://deb.debian.org/debian bookworm/main i386 libuchardet0 i386 0.0.7-1 [67.9 kB] Get: 21 http://deb.debian.org/debian bookworm/main i386 groff-base i386 1.22.4-10 [932 kB] Get: 22 http://deb.debian.org/debian bookworm/main i386 bsdextrautils i386 2.38.1-5+b1 [90.3 kB] Get: 23 http://deb.debian.org/debian bookworm/main i386 libpipeline1 i386 1.5.7-1 [40.0 kB] Get: 24 http://deb.debian.org/debian bookworm/main i386 man-db i386 2.11.2-2 [1397 kB] Get: 25 http://deb.debian.org/debian bookworm/main i386 m4 i386 1.4.19-3 [294 kB] Get: 26 http://deb.debian.org/debian bookworm/main i386 autoconf all 2.71-3 [332 kB] Get: 27 http://deb.debian.org/debian bookworm/main i386 autotools-dev all 20220109.1 [51.6 kB] Get: 28 http://deb.debian.org/debian bookworm/main i386 automake all 1:1.16.5-1.3 [823 kB] Get: 29 http://deb.debian.org/debian bookworm/main i386 autopoint all 0.21-12 [495 kB] Get: 30 http://deb.debian.org/debian bookworm/main i386 libdebhelper-perl all 13.11.4 [81.2 kB] Get: 31 http://deb.debian.org/debian bookworm/main i386 libtool all 2.4.7-5 [517 kB] Get: 32 http://deb.debian.org/debian bookworm/main i386 dh-autoreconf all 20 [17.1 kB] Get: 33 http://deb.debian.org/debian bookworm/main i386 libarchive-zip-perl all 1.68-1 [104 kB] Get: 34 http://deb.debian.org/debian bookworm/main i386 libsub-override-perl all 0.09-4 [9304 B] Get: 35 http://deb.debian.org/debian bookworm/main i386 libfile-stripnondeterminism-perl all 1.13.1-1 [19.4 kB] Get: 36 http://deb.debian.org/debian bookworm/main i386 dh-strip-nondeterminism all 1.13.1-1 [8620 B] Get: 37 http://deb.debian.org/debian bookworm/main i386 libelf1 i386 0.188-2.1 [179 kB] Get: 38 http://deb.debian.org/debian bookworm/main i386 dwz i386 0.15-1 [118 kB] Get: 39 http://deb.debian.org/debian bookworm/main i386 libicu72 i386 72.1-3 [9541 kB] Get: 40 http://deb.debian.org/debian bookworm/main i386 libxml2 i386 2.9.14+dfsg-1.3~deb12u1 [720 kB] Get: 41 http://deb.debian.org/debian bookworm/main i386 gettext i386 0.21-12 [1311 kB] Get: 42 http://deb.debian.org/debian bookworm/main i386 intltool-debian all 0.35.0+20060710.6 [22.9 kB] Get: 43 http://deb.debian.org/debian bookworm/main i386 po-debconf all 1.0.21+nmu1 [248 kB] Get: 44 http://deb.debian.org/debian bookworm/main i386 debhelper all 13.11.4 [942 kB] Get: 45 http://deb.debian.org/debian bookworm/main i386 python3-lib2to3 all 3.11.2-3 [76.3 kB] Get: 46 http://deb.debian.org/debian bookworm/main i386 python3-distutils all 3.11.2-3 [131 kB] Get: 47 http://deb.debian.org/debian bookworm/main i386 dh-python all 5.20230130+deb12u1 [104 kB] Get: 48 http://deb.debian.org/debian bookworm/main i386 libblas3 i386 3.11.0-2 [139 kB] Get: 49 http://deb.debian.org/debian bookworm/main i386 libgfortran5 i386 12.2.0-14 [698 kB] Get: 50 http://deb.debian.org/debian bookworm/main i386 liblapack3 i386 3.11.0-2 [2092 kB] Get: 51 http://deb.debian.org/debian bookworm/main i386 libyaml-0-2 i386 0.2.5-1 [55.9 kB] Get: 52 http://deb.debian.org/debian bookworm/main i386 python3-all i386 3.11.2-1+b1 [1056 B] Get: 53 http://deb.debian.org/debian bookworm/main i386 python3-attr all 22.2.0-1 [65.4 kB] Get: 54 http://deb.debian.org/debian bookworm/main i386 python3-certifi all 2022.9.24-1 [153 kB] Get: 55 http://deb.debian.org/debian bookworm/main i386 python3-pkg-resources all 66.1.1-1 [296 kB] Get: 56 http://deb.debian.org/debian bookworm/main i386 python3-chardet all 5.1.0+dfsg-2 [110 kB] Get: 57 http://deb.debian.org/debian bookworm/main i386 python3-charset-normalizer all 3.0.1-2 [49.3 kB] Get: 58 http://deb.debian.org/debian bookworm/main i386 python3-colorama all 0.4.6-2 [36.8 kB] Get: 59 http://deb.debian.org/debian bookworm/main i386 python3-click all 8.1.3-2 [92.2 kB] Get: 60 http://deb.debian.org/debian bookworm/main i386 python3-cloudpickle all 2.2.0-1 [23.6 kB] Get: 61 http://deb.debian.org/debian bookworm/main i386 python3-fsspec all 2022.11.0-1 [100 kB] Get: 62 http://deb.debian.org/debian bookworm/main i386 python3-toolz all 0.12.0-1 [43.3 kB] Get: 63 http://deb.debian.org/debian bookworm/main i386 python3-packaging all 23.0-1 [32.5 kB] Get: 64 http://deb.debian.org/debian bookworm/main i386 python3-locket all 1.0.0-1 [5804 B] Get: 65 http://deb.debian.org/debian bookworm/main i386 python3-partd all 1.3.0-1 [15.1 kB] Get: 66 http://deb.debian.org/debian bookworm/main i386 python3-yaml i386 6.0-3+b2 [119 kB] Get: 67 http://deb.debian.org/debian bookworm/main i386 python3-dask all 2022.12.1+dfsg-2 [865 kB] Get: 68 http://deb.debian.org/debian bookworm/main i386 python3-six all 1.16.0-4 [17.5 kB] Get: 69 http://deb.debian.org/debian bookworm/main i386 python3-dateutil all 2.8.2-2 [78.3 kB] Get: 70 http://deb.debian.org/debian bookworm/main i386 python3-markupsafe i386 2.1.2-1+b1 [13.3 kB] Get: 71 http://deb.debian.org/debian bookworm/main i386 python3-jinja2 all 3.1.2-1 [119 kB] Get: 72 http://deb.debian.org/debian bookworm/main i386 python3-msgpack i386 1.0.3-2+b1 [70.0 kB] Get: 73 http://deb.debian.org/debian bookworm/main i386 python3-psutil i386 5.9.4-1+b1 [190 kB] Get: 74 http://deb.debian.org/debian bookworm/main i386 python3-sortedcontainers all 2.4.0-2 [31.9 kB] Get: 75 http://deb.debian.org/debian bookworm/main i386 python3-tblib all 1.7.0-3 [13.2 kB] Get: 76 http://deb.debian.org/debian bookworm/main i386 python3-tornado i386 6.2.0-3 [337 kB] Get: 77 http://deb.debian.org/debian bookworm/main i386 python3-urllib3 all 1.26.12-1 [117 kB] Get: 78 http://deb.debian.org/debian bookworm/main i386 python3-heapdict all 1.0.1-2 [5404 B] Get: 79 http://deb.debian.org/debian bookworm/main i386 python3-zict all 2.2.0-1 [16.7 kB] Get: 80 http://deb.debian.org/debian bookworm/main i386 python3-distributed all 2022.12.1+ds.1-3 [1029 kB] Get: 81 http://deb.debian.org/debian bookworm/main i386 python3-flaky all 3.7.0-2 [20.2 kB] Get: 82 http://deb.debian.org/debian bookworm/main i386 python3-idna all 3.3-1 [39.4 kB] Get: 83 http://deb.debian.org/debian bookworm/main i386 python3-iniconfig all 1.1.1-2 [6396 B] Get: 84 http://deb.debian.org/debian bookworm/main i386 python3-more-itertools all 8.10.0-2 [53.0 kB] Get: 85 http://deb.debian.org/debian bookworm/main i386 python3-numpy i386 1:1.24.2-1 [6115 kB] Get: 86 http://deb.debian.org/debian bookworm/main i386 python3-tz all 2022.7.1-4 [30.1 kB] Get: 87 http://deb.debian.org/debian bookworm/main i386 python3-pandas-lib i386 1.5.3+dfsg-2 [3382 kB] Get: 88 http://deb.debian.org/debian bookworm/main i386 python3-pandas all 1.5.3+dfsg-2 [2885 kB] Get: 89 http://deb.debian.org/debian bookworm/main i386 python3-pluggy all 1.0.0+repack-1 [19.7 kB] Get: 90 http://deb.debian.org/debian bookworm/main i386 python3-py all 1.11.0-1 [89.2 kB] Get: 91 http://deb.debian.org/debian bookworm/main i386 python3-pytest all 7.2.1-2 [236 kB] Get: 92 http://deb.debian.org/debian bookworm/main i386 python3-requests all 2.28.1+dfsg-1 [67.9 kB] Get: 93 http://deb.debian.org/debian bookworm/main i386 python3-setuptools all 66.1.1-1 [521 kB] Fetched 47.4 MB in 3s (14.9 MB/s) debconf: delaying package configuration, since apt-utils is not installed Selecting previously unselected package libpython3.11-minimal:i386. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 18156 files and directories currently installed.) Preparing to unpack .../libpython3.11-minimal_3.11.2-6_i386.deb ... Unpacking libpython3.11-minimal:i386 (3.11.2-6) ... Selecting previously unselected package libexpat1:i386. Preparing to unpack .../libexpat1_2.5.0-1_i386.deb ... Unpacking libexpat1:i386 (2.5.0-1) ... Selecting previously unselected package python3.11-minimal. Preparing to unpack .../python3.11-minimal_3.11.2-6_i386.deb ... Unpacking python3.11-minimal (3.11.2-6) ... Setting up libpython3.11-minimal:i386 (3.11.2-6) ... Setting up libexpat1:i386 (2.5.0-1) ... Setting up python3.11-minimal (3.11.2-6) ... Selecting previously unselected package python3-minimal. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 18472 files and directories currently installed.) Preparing to unpack .../0-python3-minimal_3.11.2-1+b1_i386.deb ... Unpacking python3-minimal (3.11.2-1+b1) ... Selecting previously unselected package media-types. Preparing to unpack .../1-media-types_10.0.0_all.deb ... Unpacking media-types (10.0.0) ... Selecting previously unselected package readline-common. Preparing to unpack .../2-readline-common_8.2-1.3_all.deb ... Unpacking readline-common (8.2-1.3) ... Selecting previously unselected package libreadline8:i386. Preparing to unpack .../3-libreadline8_8.2-1.3_i386.deb ... Unpacking libreadline8:i386 (8.2-1.3) ... Selecting previously unselected package libpython3.11-stdlib:i386. Preparing to unpack .../4-libpython3.11-stdlib_3.11.2-6_i386.deb ... Unpacking libpython3.11-stdlib:i386 (3.11.2-6) ... Selecting previously unselected package python3.11. Preparing to unpack .../5-python3.11_3.11.2-6_i386.deb ... Unpacking python3.11 (3.11.2-6) ... Selecting previously unselected package libpython3-stdlib:i386. Preparing to unpack .../6-libpython3-stdlib_3.11.2-1+b1_i386.deb ... Unpacking libpython3-stdlib:i386 (3.11.2-1+b1) ... Setting up python3-minimal (3.11.2-1+b1) ... Selecting previously unselected package python3. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 18906 files and directories currently installed.) Preparing to unpack .../00-python3_3.11.2-1+b1_i386.deb ... Unpacking python3 (3.11.2-1+b1) ... Selecting previously unselected package tzdata. Preparing to unpack .../01-tzdata_2023c-5+deb12u1_all.deb ... Unpacking tzdata (2023c-5+deb12u1) ... Selecting previously unselected package sensible-utils. Preparing to unpack .../02-sensible-utils_0.0.17+nmu1_all.deb ... Unpacking sensible-utils (0.0.17+nmu1) ... Selecting previously unselected package openssl. Preparing to unpack .../03-openssl_3.0.11-1~deb12u2_i386.deb ... Unpacking openssl (3.0.11-1~deb12u2) ... Selecting previously unselected package ca-certificates. Preparing to unpack .../04-ca-certificates_20230311_all.deb ... Unpacking ca-certificates (20230311) ... Selecting previously unselected package libmagic-mgc. Preparing to unpack .../05-libmagic-mgc_1%3a5.44-3_i386.deb ... Unpacking libmagic-mgc (1:5.44-3) ... Selecting previously unselected package libmagic1:i386. Preparing to unpack .../06-libmagic1_1%3a5.44-3_i386.deb ... Unpacking libmagic1:i386 (1:5.44-3) ... Selecting previously unselected package file. Preparing to unpack .../07-file_1%3a5.44-3_i386.deb ... Unpacking file (1:5.44-3) ... Selecting previously unselected package gettext-base. Preparing to unpack .../08-gettext-base_0.21-12_i386.deb ... Unpacking gettext-base (0.21-12) ... Selecting previously unselected package libuchardet0:i386. Preparing to unpack .../09-libuchardet0_0.0.7-1_i386.deb ... Unpacking libuchardet0:i386 (0.0.7-1) ... Selecting previously unselected package groff-base. Preparing to unpack .../10-groff-base_1.22.4-10_i386.deb ... Unpacking groff-base (1.22.4-10) ... Selecting previously unselected package bsdextrautils. Preparing to unpack .../11-bsdextrautils_2.38.1-5+b1_i386.deb ... Unpacking bsdextrautils (2.38.1-5+b1) ... Selecting previously unselected package libpipeline1:i386. Preparing to unpack .../12-libpipeline1_1.5.7-1_i386.deb ... Unpacking libpipeline1:i386 (1.5.7-1) ... Selecting previously unselected package man-db. Preparing to unpack .../13-man-db_2.11.2-2_i386.deb ... Unpacking man-db (2.11.2-2) ... Selecting previously unselected package m4. Preparing to unpack .../14-m4_1.4.19-3_i386.deb ... Unpacking m4 (1.4.19-3) ... Selecting previously unselected package autoconf. Preparing to unpack .../15-autoconf_2.71-3_all.deb ... Unpacking autoconf (2.71-3) ... Selecting previously unselected package autotools-dev. Preparing to unpack .../16-autotools-dev_20220109.1_all.deb ... Unpacking autotools-dev (20220109.1) ... Selecting previously unselected package automake. Preparing to unpack .../17-automake_1%3a1.16.5-1.3_all.deb ... Unpacking automake (1:1.16.5-1.3) ... Selecting previously unselected package autopoint. Preparing to unpack .../18-autopoint_0.21-12_all.deb ... Unpacking autopoint (0.21-12) ... Selecting previously unselected package libdebhelper-perl. Preparing to unpack .../19-libdebhelper-perl_13.11.4_all.deb ... Unpacking libdebhelper-perl (13.11.4) ... Selecting previously unselected package libtool. Preparing to unpack .../20-libtool_2.4.7-5_all.deb ... Unpacking libtool (2.4.7-5) ... Selecting previously unselected package dh-autoreconf. Preparing to unpack .../21-dh-autoreconf_20_all.deb ... Unpacking dh-autoreconf (20) ... Selecting previously unselected package libarchive-zip-perl. Preparing to unpack .../22-libarchive-zip-perl_1.68-1_all.deb ... Unpacking libarchive-zip-perl (1.68-1) ... Selecting previously unselected package libsub-override-perl. Preparing to unpack .../23-libsub-override-perl_0.09-4_all.deb ... Unpacking libsub-override-perl (0.09-4) ... Selecting previously unselected package libfile-stripnondeterminism-perl. Preparing to unpack .../24-libfile-stripnondeterminism-perl_1.13.1-1_all.deb ... Unpacking libfile-stripnondeterminism-perl (1.13.1-1) ... Selecting previously unselected package dh-strip-nondeterminism. Preparing to unpack .../25-dh-strip-nondeterminism_1.13.1-1_all.deb ... Unpacking dh-strip-nondeterminism (1.13.1-1) ... Selecting previously unselected package libelf1:i386. Preparing to unpack .../26-libelf1_0.188-2.1_i386.deb ... Unpacking libelf1:i386 (0.188-2.1) ... Selecting previously unselected package dwz. Preparing to unpack .../27-dwz_0.15-1_i386.deb ... Unpacking dwz (0.15-1) ... Selecting previously unselected package libicu72:i386. Preparing to unpack .../28-libicu72_72.1-3_i386.deb ... Unpacking libicu72:i386 (72.1-3) ... Selecting previously unselected package libxml2:i386. Preparing to unpack .../29-libxml2_2.9.14+dfsg-1.3~deb12u1_i386.deb ... Unpacking libxml2:i386 (2.9.14+dfsg-1.3~deb12u1) ... Selecting previously unselected package gettext. Preparing to unpack .../30-gettext_0.21-12_i386.deb ... Unpacking gettext (0.21-12) ... Selecting previously unselected package intltool-debian. Preparing to unpack .../31-intltool-debian_0.35.0+20060710.6_all.deb ... Unpacking intltool-debian (0.35.0+20060710.6) ... Selecting previously unselected package po-debconf. Preparing to unpack .../32-po-debconf_1.0.21+nmu1_all.deb ... Unpacking po-debconf (1.0.21+nmu1) ... Selecting previously unselected package debhelper. Preparing to unpack .../33-debhelper_13.11.4_all.deb ... Unpacking debhelper (13.11.4) ... Selecting previously unselected package python3-lib2to3. Preparing to unpack .../34-python3-lib2to3_3.11.2-3_all.deb ... Unpacking python3-lib2to3 (3.11.2-3) ... Selecting previously unselected package python3-distutils. Preparing to unpack .../35-python3-distutils_3.11.2-3_all.deb ... Unpacking python3-distutils (3.11.2-3) ... Selecting previously unselected package dh-python. Preparing to unpack .../36-dh-python_5.20230130+deb12u1_all.deb ... Unpacking dh-python (5.20230130+deb12u1) ... Selecting previously unselected package libblas3:i386. Preparing to unpack .../37-libblas3_3.11.0-2_i386.deb ... Unpacking libblas3:i386 (3.11.0-2) ... Selecting previously unselected package libgfortran5:i386. Preparing to unpack .../38-libgfortran5_12.2.0-14_i386.deb ... Unpacking libgfortran5:i386 (12.2.0-14) ... Selecting previously unselected package liblapack3:i386. Preparing to unpack .../39-liblapack3_3.11.0-2_i386.deb ... Unpacking liblapack3:i386 (3.11.0-2) ... Selecting previously unselected package libyaml-0-2:i386. Preparing to unpack .../40-libyaml-0-2_0.2.5-1_i386.deb ... Unpacking libyaml-0-2:i386 (0.2.5-1) ... Selecting previously unselected package python3-all. Preparing to unpack .../41-python3-all_3.11.2-1+b1_i386.deb ... Unpacking python3-all (3.11.2-1+b1) ... Selecting previously unselected package python3-attr. Preparing to unpack .../42-python3-attr_22.2.0-1_all.deb ... Unpacking python3-attr (22.2.0-1) ... Selecting previously unselected package python3-certifi. Preparing to unpack .../43-python3-certifi_2022.9.24-1_all.deb ... Unpacking python3-certifi (2022.9.24-1) ... Selecting previously unselected package python3-pkg-resources. Preparing to unpack .../44-python3-pkg-resources_66.1.1-1_all.deb ... Unpacking python3-pkg-resources (66.1.1-1) ... Selecting previously unselected package python3-chardet. Preparing to unpack .../45-python3-chardet_5.1.0+dfsg-2_all.deb ... Unpacking python3-chardet (5.1.0+dfsg-2) ... Selecting previously unselected package python3-charset-normalizer. Preparing to unpack .../46-python3-charset-normalizer_3.0.1-2_all.deb ... Unpacking python3-charset-normalizer (3.0.1-2) ... Selecting previously unselected package python3-colorama. Preparing to unpack .../47-python3-colorama_0.4.6-2_all.deb ... Unpacking python3-colorama (0.4.6-2) ... Selecting previously unselected package python3-click. Preparing to unpack .../48-python3-click_8.1.3-2_all.deb ... Unpacking python3-click (8.1.3-2) ... Selecting previously unselected package python3-cloudpickle. Preparing to unpack .../49-python3-cloudpickle_2.2.0-1_all.deb ... Unpacking python3-cloudpickle (2.2.0-1) ... Selecting previously unselected package python3-fsspec. Preparing to unpack .../50-python3-fsspec_2022.11.0-1_all.deb ... Unpacking python3-fsspec (2022.11.0-1) ... Selecting previously unselected package python3-toolz. Preparing to unpack .../51-python3-toolz_0.12.0-1_all.deb ... Unpacking python3-toolz (0.12.0-1) ... Selecting previously unselected package python3-packaging. Preparing to unpack .../52-python3-packaging_23.0-1_all.deb ... Unpacking python3-packaging (23.0-1) ... Selecting previously unselected package python3-locket. Preparing to unpack .../53-python3-locket_1.0.0-1_all.deb ... Unpacking python3-locket (1.0.0-1) ... Selecting previously unselected package python3-partd. Preparing to unpack .../54-python3-partd_1.3.0-1_all.deb ... Unpacking python3-partd (1.3.0-1) ... Selecting previously unselected package python3-yaml. Preparing to unpack .../55-python3-yaml_6.0-3+b2_i386.deb ... Unpacking python3-yaml (6.0-3+b2) ... Selecting previously unselected package python3-dask. Preparing to unpack .../56-python3-dask_2022.12.1+dfsg-2_all.deb ... Unpacking python3-dask (2022.12.1+dfsg-2) ... Selecting previously unselected package python3-six. Preparing to unpack .../57-python3-six_1.16.0-4_all.deb ... Unpacking python3-six (1.16.0-4) ... Selecting previously unselected package python3-dateutil. Preparing to unpack .../58-python3-dateutil_2.8.2-2_all.deb ... Unpacking python3-dateutil (2.8.2-2) ... Selecting previously unselected package python3-markupsafe. Preparing to unpack .../59-python3-markupsafe_2.1.2-1+b1_i386.deb ... Unpacking python3-markupsafe (2.1.2-1+b1) ... Selecting previously unselected package python3-jinja2. Preparing to unpack .../60-python3-jinja2_3.1.2-1_all.deb ... Unpacking python3-jinja2 (3.1.2-1) ... Selecting previously unselected package python3-msgpack. Preparing to unpack .../61-python3-msgpack_1.0.3-2+b1_i386.deb ... Unpacking python3-msgpack (1.0.3-2+b1) ... Selecting previously unselected package python3-psutil. Preparing to unpack .../62-python3-psutil_5.9.4-1+b1_i386.deb ... Unpacking python3-psutil (5.9.4-1+b1) ... Selecting previously unselected package python3-sortedcontainers. Preparing to unpack .../63-python3-sortedcontainers_2.4.0-2_all.deb ... Unpacking python3-sortedcontainers (2.4.0-2) ... Selecting previously unselected package python3-tblib. Preparing to unpack .../64-python3-tblib_1.7.0-3_all.deb ... Unpacking python3-tblib (1.7.0-3) ... Selecting previously unselected package python3-tornado. Preparing to unpack .../65-python3-tornado_6.2.0-3_i386.deb ... Unpacking python3-tornado (6.2.0-3) ... Selecting previously unselected package python3-urllib3. Preparing to unpack .../66-python3-urllib3_1.26.12-1_all.deb ... Unpacking python3-urllib3 (1.26.12-1) ... Selecting previously unselected package python3-heapdict. Preparing to unpack .../67-python3-heapdict_1.0.1-2_all.deb ... Unpacking python3-heapdict (1.0.1-2) ... Selecting previously unselected package python3-zict. Preparing to unpack .../68-python3-zict_2.2.0-1_all.deb ... Unpacking python3-zict (2.2.0-1) ... Selecting previously unselected package python3-distributed. Preparing to unpack .../69-python3-distributed_2022.12.1+ds.1-3_all.deb ... Unpacking python3-distributed (2022.12.1+ds.1-3) ... Selecting previously unselected package python3-flaky. Preparing to unpack .../70-python3-flaky_3.7.0-2_all.deb ... Unpacking python3-flaky (3.7.0-2) ... Selecting previously unselected package python3-idna. Preparing to unpack .../71-python3-idna_3.3-1_all.deb ... Unpacking python3-idna (3.3-1) ... Selecting previously unselected package python3-iniconfig. Preparing to unpack .../72-python3-iniconfig_1.1.1-2_all.deb ... Unpacking python3-iniconfig (1.1.1-2) ... Selecting previously unselected package python3-more-itertools. Preparing to unpack .../73-python3-more-itertools_8.10.0-2_all.deb ... Unpacking python3-more-itertools (8.10.0-2) ... Selecting previously unselected package python3-numpy. Preparing to unpack .../74-python3-numpy_1%3a1.24.2-1_i386.deb ... Unpacking python3-numpy (1:1.24.2-1) ... Selecting previously unselected package python3-tz. Preparing to unpack .../75-python3-tz_2022.7.1-4_all.deb ... Unpacking python3-tz (2022.7.1-4) ... Selecting previously unselected package python3-pandas-lib:i386. Preparing to unpack .../76-python3-pandas-lib_1.5.3+dfsg-2_i386.deb ... Unpacking python3-pandas-lib:i386 (1.5.3+dfsg-2) ... Selecting previously unselected package python3-pandas. Preparing to unpack .../77-python3-pandas_1.5.3+dfsg-2_all.deb ... Unpacking python3-pandas (1.5.3+dfsg-2) ... Selecting previously unselected package python3-pluggy. Preparing to unpack .../78-python3-pluggy_1.0.0+repack-1_all.deb ... Unpacking python3-pluggy (1.0.0+repack-1) ... Selecting previously unselected package python3-py. Preparing to unpack .../79-python3-py_1.11.0-1_all.deb ... Unpacking python3-py (1.11.0-1) ... Selecting previously unselected package python3-pytest. Preparing to unpack .../80-python3-pytest_7.2.1-2_all.deb ... Unpacking python3-pytest (7.2.1-2) ... Selecting previously unselected package python3-requests. Preparing to unpack .../81-python3-requests_2.28.1+dfsg-1_all.deb ... Unpacking python3-requests (2.28.1+dfsg-1) ... Selecting previously unselected package python3-setuptools. Preparing to unpack .../82-python3-setuptools_66.1.1-1_all.deb ... Unpacking python3-setuptools (66.1.1-1) ... Setting up media-types (10.0.0) ... Setting up libpipeline1:i386 (1.5.7-1) ... Setting up libicu72:i386 (72.1-3) ... Setting up bsdextrautils (2.38.1-5+b1) ... Setting up libmagic-mgc (1:5.44-3) ... Setting up libarchive-zip-perl (1.68-1) ... Setting up libyaml-0-2:i386 (0.2.5-1) ... Setting up libdebhelper-perl (13.11.4) ... Setting up libmagic1:i386 (1:5.44-3) ... Setting up gettext-base (0.21-12) ... Setting up m4 (1.4.19-3) ... Setting up file (1:5.44-3) ... Setting up tzdata (2023c-5+deb12u1) ... Current default time zone: 'Etc/UTC' Local time is now: Tue Jan 23 23:51:21 UTC 2024. Universal Time is now: Tue Jan 23 23:51:21 UTC 2024. Run 'dpkg-reconfigure tzdata' if you wish to change it. Setting up autotools-dev (20220109.1) ... Setting up libblas3:i386 (3.11.0-2) ... update-alternatives: using /usr/lib/i386-linux-gnu/blas/libblas.so.3 to provide /usr/lib/i386-linux-gnu/libblas.so.3 (libblas.so.3-i386-linux-gnu) in auto mode Setting up autopoint (0.21-12) ... Setting up libgfortran5:i386 (12.2.0-14) ... Setting up autoconf (2.71-3) ... Setting up sensible-utils (0.0.17+nmu1) ... Setting up libuchardet0:i386 (0.0.7-1) ... Setting up libsub-override-perl (0.09-4) ... Setting up openssl (3.0.11-1~deb12u2) ... Setting up libelf1:i386 (0.188-2.1) ... Setting up readline-common (8.2-1.3) ... Setting up libxml2:i386 (2.9.14+dfsg-1.3~deb12u1) ... Setting up automake (1:1.16.5-1.3) ... update-alternatives: using /usr/bin/automake-1.16 to provide /usr/bin/automake (automake) in auto mode Setting up libfile-stripnondeterminism-perl (1.13.1-1) ... Setting up liblapack3:i386 (3.11.0-2) ... update-alternatives: using /usr/lib/i386-linux-gnu/lapack/liblapack.so.3 to provide /usr/lib/i386-linux-gnu/liblapack.so.3 (liblapack.so.3-i386-linux-gnu) in auto mode Setting up gettext (0.21-12) ... Setting up libtool (2.4.7-5) ... Setting up libreadline8:i386 (8.2-1.3) ... Setting up intltool-debian (0.35.0+20060710.6) ... Setting up dh-autoreconf (20) ... Setting up ca-certificates (20230311) ... Updating certificates in /etc/ssl/certs... 140 added, 0 removed; done. Setting up dh-strip-nondeterminism (1.13.1-1) ... Setting up dwz (0.15-1) ... Setting up groff-base (1.22.4-10) ... Setting up po-debconf (1.0.21+nmu1) ... Setting up libpython3.11-stdlib:i386 (3.11.2-6) ... Setting up man-db (2.11.2-2) ... Not building database; man-db/auto-update is not 'true'. Setting up libpython3-stdlib:i386 (3.11.2-1+b1) ... Setting up python3.11 (3.11.2-6) ... Setting up debhelper (13.11.4) ... Setting up python3 (3.11.2-1+b1) ... Setting up python3-sortedcontainers (2.4.0-2) ... Setting up python3-markupsafe (2.1.2-1+b1) ... Setting up python3-psutil (5.9.4-1+b1) ... Setting up python3-tz (2022.7.1-4) ... Setting up python3-cloudpickle (2.2.0-1) ... Setting up python3-six (1.16.0-4) ... Setting up python3-jinja2 (3.1.2-1) ... Setting up python3-packaging (23.0-1) ... Setting up python3-flaky (3.7.0-2) ... Setting up python3-certifi (2022.9.24-1) ... Setting up python3-idna (3.3-1) ... Setting up python3-urllib3 (1.26.12-1) ... Setting up python3-pluggy (1.0.0+repack-1) ... Setting up python3-toolz (0.12.0-1) ... Setting up python3-dateutil (2.8.2-2) ... Setting up python3-msgpack (1.0.3-2+b1) ... Setting up python3-lib2to3 (3.11.2-3) ... Setting up python3-locket (1.0.0-1) ... Setting up python3-pkg-resources (66.1.1-1) ... Setting up python3-distutils (3.11.2-3) ... Setting up dh-python (5.20230130+deb12u1) ... Setting up python3-partd (1.3.0-1) ... Setting up python3-more-itertools (8.10.0-2) ... Setting up python3-heapdict (1.0.1-2) ... Setting up python3-iniconfig (1.1.1-2) ... Setting up python3-attr (22.2.0-1) ... Setting up python3-tornado (6.2.0-3) ... Setting up python3-setuptools (66.1.1-1) ... Setting up python3-tblib (1.7.0-3) ... Setting up python3-py (1.11.0-1) ... Setting up python3-colorama (0.4.6-2) ... Setting up python3-charset-normalizer (3.0.1-2) ... Setting up python3-pytest (7.2.1-2) ... Setting up python3-fsspec (2022.11.0-1) ... Setting up python3-all (3.11.2-1+b1) ... Setting up python3-yaml (6.0-3+b2) ... Setting up python3-click (8.1.3-2) ... Setting up python3-chardet (5.1.0+dfsg-2) ... Setting up python3-requests (2.28.1+dfsg-1) ... Setting up python3-numpy (1:1.24.2-1) ... Setting up python3-zict (2.2.0-1) ... Setting up python3-pandas-lib:i386 (1.5.3+dfsg-2) ... Setting up python3-dask (2022.12.1+dfsg-2) ... Setting up python3-distributed (2022.12.1+ds.1-3) ... Setting up python3-pandas (1.5.3+dfsg-2) ... Processing triggers for libc-bin (2.36-9+deb12u3) ... Processing triggers for ca-certificates (20230311) ... Updating certificates in /etc/ssl/certs... 0 added, 0 removed; done. Running hooks in /etc/ca-certificates/update.d... done. Reading package lists... Building dependency tree... Reading state information... Reading extended state information... Initializing package states... Writing extended state information... Building tag database... -> Finished parsing the build-deps I: Building the package I: Running cd /build/reproducible-path/python-streamz-0.6.4/ && env PATH="/usr/sbin:/usr/bin:/sbin:/bin:/usr/games" HOME="/nonexistent/first-build" dpkg-buildpackage -us -uc -b && env PATH="/usr/sbin:/usr/bin:/sbin:/bin:/usr/games" HOME="/nonexistent/first-build" dpkg-genchanges -S > ../python-streamz_0.6.4-1_source.changes dpkg-buildpackage: info: source package python-streamz dpkg-buildpackage: info: source version 0.6.4-1 dpkg-buildpackage: info: source distribution unstable dpkg-buildpackage: info: source changed by Nilesh Patra dpkg-source --before-build . dpkg-buildpackage: info: host architecture i386 debian/rules clean dh clean --with python3 --buildsystem=pybuild dh_auto_clean -O--buildsystem=pybuild I: pybuild base:240: python3.11 setup.py clean running clean removing '/build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build' (and everything under it) 'build/bdist.linux-i386' does not exist -- can't clean it 'build/scripts-3.11' does not exist -- can't clean it dh_autoreconf_clean -O--buildsystem=pybuild dh_clean -O--buildsystem=pybuild debian/rules binary dh binary --with python3 --buildsystem=pybuild dh_update_autotools_config -O--buildsystem=pybuild dh_autoreconf -O--buildsystem=pybuild dh_auto_configure -O--buildsystem=pybuild I: pybuild base:240: python3.11 setup.py config running config dh_auto_build -O--buildsystem=pybuild I: pybuild base:240: /usr/bin/python3 setup.py build running build running build_py creating /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz copying streamz/graph.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz copying streamz/river.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz copying streamz/orderedweakset.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz copying streamz/sinks.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz copying streamz/sources.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz copying streamz/utils_test.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz copying streamz/core.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz copying streamz/dask.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz copying streamz/collection.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz copying streamz/batch.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz copying streamz/__init__.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz copying streamz/utils.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz copying streamz/plugins.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz creating /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/dataframe copying streamz/dataframe/aggregations.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/dataframe copying streamz/dataframe/core.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/dataframe copying streamz/dataframe/__init__.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/dataframe copying streamz/dataframe/utils.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/dataframe creating /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/tests copying streamz/tests/test_kafka.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/tests copying streamz/tests/test_graph.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/tests copying streamz/tests/test_core.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/tests copying streamz/tests/py3_test_core.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/tests copying streamz/tests/test_batch.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/tests copying streamz/tests/test_sinks.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/tests copying streamz/tests/test_plugins.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/tests copying streamz/tests/test_sources.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/tests copying streamz/tests/__init__.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/tests copying streamz/tests/test_dask.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/tests creating /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/dataframe/tests copying streamz/dataframe/tests/test_dataframe_utils.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/dataframe/tests copying streamz/dataframe/tests/test_dataframes.py -> /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/dataframe/tests dh_auto_test -O--buildsystem=pybuild I: pybuild base:240: cd /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build; python3.11 -m pytest ============================= test session starts ============================== platform linux -- Python 3.11.2, pytest-7.2.1, pluggy-1.0.0+repack rootdir: /build/reproducible-path/python-streamz-0.6.4, configfile: setup.cfg plugins: flaky-3.7.0 collected 1570 items / 2 skipped streamz/dataframe/tests/test_dataframe_utils.py .s.s [ 0%] streamz/dataframe/tests/test_dataframes.py ............................. [ 2%] ........................................................................ [ 6%] ...F...........F....F....F....F....F....F....F....F....F....F....F....F. [ 11%] ...F....F....F....F....F....F....F....F....F....F....F....F..FF....ss... [ 15%] .sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 20%] ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 25%] ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 29%] ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 34%] ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 38%] ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss [ 43%] s.....xxxxxxx........................................................... [ 47%] ........................F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X.. [ 52%] F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X.. [ 57%] F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X.. [ 61%] F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X..F..X.. [ 66%] F..X..F..X..F..X..F..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X. [ 70%] .FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..F [ 75%] F..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF. [ 80%] .X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X [ 84%] ..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X..FF..X.. [ 89%] ..FF.......... [ 90%] streamz/tests/test_batch.py .... [ 90%] streamz/tests/test_core.py ......................................F...... [ 93%] .....................s................................................. [ 97%] streamz/tests/test_dask.py .............s. [ 98%] streamz/tests/test_plugins.py .... [ 98%] streamz/tests/test_sinks.py .....ss [ 99%] streamz/tests/test_sources.py .XXXxx... [100%] =================================== FAILURES =================================== _______________________ test_dataframe_simple[1] _______________________ func = at 0xb2b74348> @pytest.mark.parametrize('func', [ lambda df: df.query('x > 1 and x < 4', engine='python'), lambda df: df.x.value_counts().nlargest(2) ]) def test_dataframe_simple(func): df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]}) expected = func(df) a = DataFrame(example=df) L = func(a).stream.sink_to_list() a.emit(df) > assert_eq(L[0], expected) streamz/dataframe/tests/test_dataframes.py:191: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 1 1 2 1 Name: x, dtype: int32, b = 1 1 2 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __________ test_groupby_aggregate[core-0-0-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b74758> indexer = at 0xb2b74708>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 2 1.0 1 Name: y, dtype: int32 b = x 0.0 2 1.0 1 Name: y, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __________ test_groupby_aggregate[core-0-1-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b747a8> indexer = at 0xb2b74708>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 2 1.0 1 Name: y, dtype: int32 b = x 0.0 2 1.0 1 Name: y, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __________ test_groupby_aggregate[core-0-2-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b747f8> indexer = at 0xb2b74708>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 2 1 1 Name: y, dtype: int32, b = 0 2 1 1 Name: y, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __________ test_groupby_aggregate[core-0-3-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b74848> indexer = at 0xb2b74708>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 2 1.0 1 Name: y, dtype: int32 b = x 0.0 2 1.0 1 Name: y, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __________ test_groupby_aggregate[core-1-0-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b74758> indexer = at 0xb2b74898>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y -overlapped-index-name-0 0.0 2 2 1.0 1 1 b = x y -overlapped-index-name-0 0.0 2 2 1.0 1 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[core-1-1-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b747a8> indexer = at 0xb2b74898>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y x 0.0 2 1.0 1, b = y x 0.0 2 1.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="y") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[core-1-2-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b747f8> indexer = at 0xb2b74898>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0 2 2 1 1 1, b = x y 0 2 2 1 1 1, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[core-1-3-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b74848> indexer = at 0xb2b74898>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y x 0.0 2 1.0 1, b = y x 0.0 2 1.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="y") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[core-2-0-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b74758> indexer = at 0xb2b748e8>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y x 0.0 2 1.0 1, b = y x 0.0 2 1.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="y") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[core-2-1-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b747a8> indexer = at 0xb2b748e8>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y x 0.0 2 1.0 1, b = y x 0.0 2 1.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="y") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[core-2-2-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b747f8> indexer = at 0xb2b748e8>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 0 2 1 1, b = y 0 2 1 1, check_names = True, check_dtype = True check_divisions = True, check_index = True, sort_results = True scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="y") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[core-2-3-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b74848> indexer = at 0xb2b748e8>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y x 0.0 2 1.0 1, b = y x 0.0 2 1.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="y") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[dask-0-0-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b74758> indexer = at 0xb2b74708>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 2 1.0 1 Name: y, dtype: int32 b = x 0.0 2 1.0 1 Name: y, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __________ test_groupby_aggregate[dask-0-1-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b747a8> indexer = at 0xb2b74708>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 2 1.0 1 Name: y, dtype: int32 b = x 0.0 2 1.0 1 Name: y, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __________ test_groupby_aggregate[dask-0-2-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b747f8> indexer = at 0xb2b74708>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 2 1 1 Name: y, dtype: int32, b = 0 2 1 1 Name: y, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __________ test_groupby_aggregate[dask-0-3-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b74848> indexer = at 0xb2b74708>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 2 1.0 1 Name: y, dtype: int32 b = x 0.0 2 1.0 1 Name: y, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __________ test_groupby_aggregate[dask-1-0-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b74758> indexer = at 0xb2b74898>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y -overlapped-index-name-0 0.0 2 2 1.0 1 1 b = x y -overlapped-index-name-0 0.0 2 2 1.0 1 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[dask-1-1-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b747a8> indexer = at 0xb2b74898>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y x 0.0 2 1.0 1, b = y x 0.0 2 1.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="y") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[dask-1-2-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b747f8> indexer = at 0xb2b74898>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0 2 2 1 1 1, b = x y 0 2 2 1 1 1, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[dask-1-3-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b74848> indexer = at 0xb2b74898>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y x 0.0 2 1.0 1, b = y x 0.0 2 1.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="y") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[dask-2-0-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b74758> indexer = at 0xb2b748e8>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y x 0.0 2 1.0 1, b = y x 0.0 2 1.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="y") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[dask-2-1-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b747a8> indexer = at 0xb2b748e8>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y x 0.0 2 1.0 1, b = y x 0.0 2 1.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="y") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[dask-2-2-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b747f8> indexer = at 0xb2b748e8>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 0 2 1 1, b = y 0 2 1 1, check_names = True, check_dtype = True check_divisions = True, check_index = True, sort_results = True scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="y") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __________ test_groupby_aggregate[dask-2-3-2] __________ agg = at 0xb2b74618> grouper = at 0xb2b74848> indexer = at 0xb2b748e8>, stream = @pytest.mark.parametrize('agg', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), # pytest.mark.xfail(lambda x: x.var(ddof=0), reason="don't know") ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'x', lambda a: a.index % 2, lambda a: ['x'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.y, lambda g: g, lambda g: g[['y']] # lambda g: g[['x', 'y']] ]) def test_groupby_aggregate(agg, grouper, indexer, stream): df = pd.DataFrame({'x': (np.arange(10) // 2).astype(float), 'y': [1.0, 2.0] * 5}) a = DataFrame(example=df.iloc[:0], stream=stream) def f(x): return agg(indexer(x.groupby(grouper(x)))) L = f(a).stream.gather().sink_to_list() a.emit(df.iloc[:3]) a.emit(df.iloc[3:7]) a.emit(df.iloc[7:]) first = df.iloc[:3] > assert assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:301: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y x 0.0 2 1.0 1, b = y x 0.0 2 1.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="y") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___________________________ test_value_counts[core] ____________________________ stream = def test_value_counts(stream): s = pd.Series(['a', 'b', 'a']) a = Series(example=s, stream=stream) b = a.value_counts() assert b._stream_type == 'updating' result = b.stream.gather().sink_to_list() a.emit(s) a.emit(s) > assert_eq(result[-1], pd.concat([s, s], axis=0).value_counts()) streamz/dataframe/tests/test_dataframes.py:317: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = a 4 b 2 dtype: int32, b = a 4 b 2 dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError ___________________________ test_value_counts[dask] ____________________________ stream = def test_value_counts(stream): s = pd.Series(['a', 'b', 'a']) a = Series(example=s, stream=stream) b = a.value_counts() assert b._stream_type == 'updating' result = b.stream.gather().sink_to_list() a.emit(s) a.emit(s) > assert_eq(result[-1], pd.concat([s, s], axis=0).value_counts()) streamz/dataframe/tests/test_dataframes.py:317: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = a 4 b 2 dtype: int32, b = a 4 b 2 dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __ test_groupby_windowing_value[0-0-0-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e0c8> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 3 1.0 2 2.0 2 3.0 3 Name: x, dtype: int32 b = x 0.0 3 1.0 2 2.0 2 3.0 3 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError ___ test_groupby_windowing_value[0-0-0-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e0c8> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 4 1.0 3 2.0 3 3.0 3 Name: x, dtype: int32 b = x 0.0 4 1.0 3 2.0 3 3.0 3 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __ test_groupby_windowing_value[0-0-1-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e0c8> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 3 1.0 2 2.0 2 3.0 3 Name: x, dtype: int32 b = x 0.0 3 1.0 2 2.0 2 3.0 3 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError ___ test_groupby_windowing_value[0-0-1-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e0c8> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 4 1.0 3 2.0 3 3.0 3 Name: x, dtype: int32 b = x 0.0 4 1.0 3 2.0 3 3.0 3 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __ test_groupby_windowing_value[0-1-0-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e118> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 0.0 5 1.0 5 Name: x, dtype: int32 b = y 0.0 5 1.0 5 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError ___ test_groupby_windowing_value[0-1-0-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e118> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 0.0 7 1.0 6 Name: x, dtype: int32 b = y 0.0 7 1.0 6 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __ test_groupby_windowing_value[0-1-1-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e118> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 0.0 5 1.0 5 Name: x, dtype: int32 b = y 0.0 5 1.0 5 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError ___ test_groupby_windowing_value[0-1-1-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e118> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 0.0 7 1.0 6 Name: x, dtype: int32 b = y 0.0 7 1.0 6 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __ test_groupby_windowing_value[0-2-0-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e168> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 2000-01-01 03:00:00 1 2000-01-01 04:00:00 1 2000-01-01 05:00:00 1 2000-01-01 06:00:00 1 2000-01-01 07:00:0...00-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 Name: x, dtype: int32 b = 2000-01-01 03:00:00 1 2000-01-01 04:00:00 1 2000-01-01 05:00:00 1 2000-01-01 06:00:00 1 2000-01-01 07:00:0...00-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError ___ test_groupby_windowing_value[0-2-0-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e168> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 2000-01-01 00:00:00 1 2000-01-01 01:00:00 1 2000-01-01 02:00:00 1 2000-01-01 03:00:00 1 2000-01-01 04:00:0...00-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 Name: x, dtype: int32 b = 2000-01-01 00:00:00 1 2000-01-01 01:00:00 1 2000-01-01 02:00:00 1 2000-01-01 03:00:00 1 2000-01-01 04:00:0...00-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __ test_groupby_windowing_value[0-2-1-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e168> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 2000-01-01 03:00:00 1 2000-01-01 04:00:00 1 2000-01-01 05:00:00 1 2000-01-01 06:00:00 1 2000-01-01 07:00:0...00-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 Name: x, dtype: int32 b = 2000-01-01 03:00:00 1 2000-01-01 04:00:00 1 2000-01-01 05:00:00 1 2000-01-01 06:00:00 1 2000-01-01 07:00:0...00-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError ___ test_groupby_windowing_value[0-2-1-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e168> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 2000-01-01 00:00:00 1 2000-01-01 01:00:00 1 2000-01-01 02:00:00 1 2000-01-01 03:00:00 1 2000-01-01 04:00:0...00-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 Name: x, dtype: int32 b = 2000-01-01 00:00:00 1 2000-01-01 01:00:00 1 2000-01-01 02:00:00 1 2000-01-01 03:00:00 1 2000-01-01 04:00:0...00-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __ test_groupby_windowing_value[0-3-0-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e1b8> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 0.0 5 1.0 5 Name: x, dtype: int32 b = y 0.0 5 1.0 5 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError ___ test_groupby_windowing_value[0-3-0-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e1b8> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 0.0 7 1.0 6 Name: x, dtype: int32 b = y 0.0 7 1.0 6 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __ test_groupby_windowing_value[0-3-1-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e1b8> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 0.0 5 1.0 5 Name: x, dtype: int32 b = y 0.0 5 1.0 5 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError ___ test_groupby_windowing_value[0-3-1-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e1b8> indexer = at 0xb2b7e208> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 0.0 7 1.0 6 Name: x, dtype: int32 b = y 0.0 7 1.0 6 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError __ test_groupby_windowing_value[1-0-0-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e0c8> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y -overlapped-index-name-0 0.0 3 3 1.0 2 2 2.0 2 2 3.0 3 3 b = x y -overlapped-index-name-0 0.0 3 3 1.0 2 2 2.0 2 2 3.0 3 3 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[1-0-0-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e0c8> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y -overlapped-index-name-0 0.0 4 4 1.0 3 3 2.0 3 3 3.0 3 3 b = x y -overlapped-index-name-0 0.0 4 4 1.0 3 3 2.0 3 3 3.0 3 3 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[1-0-1-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e0c8> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y -overlapped-index-name-0 0.0 3 3 1.0 2 2 2.0 2 2 3.0 3 3 b = x y -overlapped-index-name-0 0.0 3 3 1.0 2 2 2.0 2 2 3.0 3 3 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[1-0-1-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e0c8> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y -overlapped-index-name-0 0.0 4 4 1.0 3 3 2.0 3 3 3.0 3 3 b = x y -overlapped-index-name-0 0.0 4 4 1.0 3 3 2.0 3 3 3.0 3 3 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[1-1-0-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e118> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 5 1.0 5, b = x y 0.0 5 1.0 5 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[1-1-0-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e118> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 7 1.0 6, b = x y 0.0 7 1.0 6 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[1-1-1-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e118> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 5 1.0 5, b = x y 0.0 5 1.0 5 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[1-1-1-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e118> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 7 1.0 6, b = x y 0.0 7 1.0 6 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[1-2-0-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e168> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 2000-01-01 03:00:00 1 1 2000-01-01 04:00:00 1 1 2000-01-01 05:00:00 1 1 2000-01-01 06:... 08:00:00 1 1 2000-01-01 09:00:00 1 1 2000-01-01 10:00:00 1 1 2000-01-01 11:00:00 1 1 2000-01-01 12:00:00 1 1 b = x y 2000-01-01 03:00:00 1 1 2000-01-01 04:00:00 1 1 2000-01-01 05:00:00 1 1 2000-01-01 06:... 08:00:00 1 1 2000-01-01 09:00:00 1 1 2000-01-01 10:00:00 1 1 2000-01-01 11:00:00 1 1 2000-01-01 12:00:00 1 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[1-2-0-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e168> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 2000-01-01 00:00:00 1 1 2000-01-01 01:00:00 1 1 2000-01-01 02:00:00 1 1 2000-01-01 03:... 08:00:00 1 1 2000-01-01 09:00:00 1 1 2000-01-01 10:00:00 1 1 2000-01-01 11:00:00 1 1 2000-01-01 12:00:00 1 1 b = x y 2000-01-01 00:00:00 1 1 2000-01-01 01:00:00 1 1 2000-01-01 02:00:00 1 1 2000-01-01 03:... 08:00:00 1 1 2000-01-01 09:00:00 1 1 2000-01-01 10:00:00 1 1 2000-01-01 11:00:00 1 1 2000-01-01 12:00:00 1 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[1-2-1-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e168> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 2000-01-01 03:00:00 1 1 2000-01-01 04:00:00 1 1 2000-01-01 05:00:00 1 1 2000-01-01 06:... 08:00:00 1 1 2000-01-01 09:00:00 1 1 2000-01-01 10:00:00 1 1 2000-01-01 11:00:00 1 1 2000-01-01 12:00:00 1 1 b = x y 2000-01-01 03:00:00 1 1 2000-01-01 04:00:00 1 1 2000-01-01 05:00:00 1 1 2000-01-01 06:... 08:00:00 1 1 2000-01-01 09:00:00 1 1 2000-01-01 10:00:00 1 1 2000-01-01 11:00:00 1 1 2000-01-01 12:00:00 1 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[1-2-1-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e168> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 2000-01-01 00:00:00 1 1 2000-01-01 01:00:00 1 1 2000-01-01 02:00:00 1 1 2000-01-01 03:... 08:00:00 1 1 2000-01-01 09:00:00 1 1 2000-01-01 10:00:00 1 1 2000-01-01 11:00:00 1 1 2000-01-01 12:00:00 1 1 b = x y 2000-01-01 00:00:00 1 1 2000-01-01 01:00:00 1 1 2000-01-01 02:00:00 1 1 2000-01-01 03:... 08:00:00 1 1 2000-01-01 09:00:00 1 1 2000-01-01 10:00:00 1 1 2000-01-01 11:00:00 1 1 2000-01-01 12:00:00 1 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[1-3-0-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e1b8> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 5 1.0 5, b = x y 0.0 5 1.0 5 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[1-3-0-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e1b8> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 7 1.0 6, b = x y 0.0 7 1.0 6 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[1-3-1-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e1b8> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 5 1.0 5, b = x y 0.0 5 1.0 5 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[1-3-1-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e1b8> indexer = at 0xb2b7e258> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 7 1.0 6, b = x y 0.0 7 1.0 6 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[2-0-0-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e0c8> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x -overlapped-index-name-0 0.0 3 1.0 2 2.0 2 3.0 3 b = x -overlapped-index-name-0 0.0 3 1.0 2 2.0 2 3.0 3 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[2-0-0-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e0c8> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x -overlapped-index-name-0 0.0 4 1.0 3 2.0 3 3.0 3 b = x -overlapped-index-name-0 0.0 4 1.0 3 2.0 3 3.0 3 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[2-0-1-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e0c8> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x -overlapped-index-name-0 0.0 3 1.0 2 2.0 2 3.0 3 b = x -overlapped-index-name-0 0.0 3 1.0 2 2.0 2 3.0 3 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[2-0-1-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e0c8> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x -overlapped-index-name-0 0.0 4 1.0 3 2.0 3 3.0 3 b = x -overlapped-index-name-0 0.0 4 1.0 3 2.0 3 3.0 3 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[2-1-0-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e118> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 5 1.0 5, b = x y 0.0 5 1.0 5 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[2-1-0-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e118> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 7 1.0 6, b = x y 0.0 7 1.0 6 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[2-1-1-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e118> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 5 1.0 5, b = x y 0.0 5 1.0 5 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[2-1-1-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e118> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 7 1.0 6, b = x y 0.0 7 1.0 6 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[2-2-0-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e168> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 2000-01-01 03:00:00 1 2000-01-01 04:00:00 1 2000-01-01 05:00:00 1 2000-01-01 06:00:00 1 200...0 1 2000-01-01 08:00:00 1 2000-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 b = x 2000-01-01 03:00:00 1 2000-01-01 04:00:00 1 2000-01-01 05:00:00 1 2000-01-01 06:00:00 1 200...0 1 2000-01-01 08:00:00 1 2000-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[2-2-0-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e168> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 2000-01-01 00:00:00 1 2000-01-01 01:00:00 1 2000-01-01 02:00:00 1 2000-01-01 03:00:00 1 200...0 1 2000-01-01 08:00:00 1 2000-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 b = x 2000-01-01 00:00:00 1 2000-01-01 01:00:00 1 2000-01-01 02:00:00 1 2000-01-01 03:00:00 1 200...0 1 2000-01-01 08:00:00 1 2000-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[2-2-1-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e168> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 2000-01-01 03:00:00 1 2000-01-01 04:00:00 1 2000-01-01 05:00:00 1 2000-01-01 06:00:00 1 200...0 1 2000-01-01 08:00:00 1 2000-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 b = x 2000-01-01 03:00:00 1 2000-01-01 04:00:00 1 2000-01-01 05:00:00 1 2000-01-01 06:00:00 1 200...0 1 2000-01-01 08:00:00 1 2000-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[2-2-1-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e168> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 2000-01-01 00:00:00 1 2000-01-01 01:00:00 1 2000-01-01 02:00:00 1 2000-01-01 03:00:00 1 200...0 1 2000-01-01 08:00:00 1 2000-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 b = x 2000-01-01 00:00:00 1 2000-01-01 01:00:00 1 2000-01-01 02:00:00 1 2000-01-01 03:00:00 1 200...0 1 2000-01-01 08:00:00 1 2000-01-01 09:00:00 1 2000-01-01 10:00:00 1 2000-01-01 11:00:00 1 2000-01-01 12:00:00 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[2-3-0-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e1b8> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 5 1.0 5, b = x y 0.0 5 1.0 5 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[2-3-0-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e028> grouper = at 0xb2b7e1b8> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 7 1.0 6, b = x y 0.0 7 1.0 6 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError __ test_groupby_windowing_value[2-3-1-10h-2] ___ func = at 0xb2b7ce88>, value = Timedelta('0 days 10:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e1b8> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 5 1.0 5, b = x y 0.0 5 1.0 5 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ___ test_groupby_windowing_value[2-3-1-1d-2] ___ func = at 0xb2b7ce88>, value = Timedelta('1 days 00:00:00') getter = at 0xb2b7e078> grouper = at 0xb2b7e1b8> indexer = at 0xb2b7e2a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.var(ddof=1), lambda x: x.std(), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('value', ['10h', '1d']) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 4, lambda a: 'y', lambda a: a.index, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_value(func, value, getter, grouper, indexer): index = pd.date_range(start='2000-01-01', end='2000-01-03', freq='1h') df = pd.DataFrame({'x': np.arange(len(index), dtype=float), 'y': np.arange(len(index), dtype=float) % 2}, index=index) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(value)).stream.gather().sink_to_list() value = pd.Timedelta(value) diff = 13 for i in range(0, len(index), diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[:diff] first = first[first.index.max() - value + pd.Timedelta('1ns'):] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:849: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0.0 7 1.0 6, b = x y 0.0 7 1.0 6 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[0-0-0-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e618> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 2.0 1 Name: x, dtype: int32, b = x 2.0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-0-0-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e618> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 2.0 1 Name: x, dtype: int32, b = x 2.0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-0-0-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e618> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 1 1.0 1 2.0 1 Name: x, dtype: int32 b = x 0.0 1 1.0 1 2.0 1 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-0-0-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e618> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 1 1.0 1 2.0 1 Name: x, dtype: int32 b = x 0.0 1 1.0 1 2.0 1 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-0-1-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e618> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 2.0 1 Name: x, dtype: int32, b = x 2.0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-0-1-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e618> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 2.0 1 Name: x, dtype: int32, b = x 2.0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-0-1-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e618> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 1 1.0 1 2.0 1 Name: x, dtype: int32 b = x 0.0 1 1.0 1 2.0 1 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-0-1-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e618> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 1 1.0 1 2.0 1 Name: x, dtype: int32 b = x 0.0 1 1.0 1 2.0 1 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-1-0-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e668> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 Name: x, dtype: int32, b = y 1.0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-1-0-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e668> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 Name: x, dtype: int32, b = y 1.0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-1-0-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e668> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 Name: x, dtype: int32 b = y 1.0 2 2.0 1 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-1-0-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e668> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 Name: x, dtype: int32 b = y 1.0 2 2.0 1 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-1-1-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e668> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 Name: x, dtype: int32, b = y 1.0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-1-1-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e668> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 Name: x, dtype: int32, b = y 1.0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-1-1-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e668> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 Name: x, dtype: int32 b = y 1.0 2 2.0 1 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-1-1-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e668> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 Name: x, dtype: int32 b = y 1.0 2 2.0 1 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-2-0-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 1 Name: x, dtype: int32, b = 0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-2-0-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 1 Name: x, dtype: int32, b = 0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-2-0-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 2 1 1 Name: x, dtype: int32, b = 0 2 1 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-2-0-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 2 1 1 Name: x, dtype: int32, b = 0 2 1 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-2-1-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 1 Name: x, dtype: int32, b = 0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-2-1-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 1 Name: x, dtype: int32, b = 0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-2-1-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 2 1 1 Name: x, dtype: int32, b = 0 2 1 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-2-1-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 2 1 1 Name: x, dtype: int32, b = 0 2 1 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-3-0-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e708> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 Name: x, dtype: int32, b = y 1.0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-3-0-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e708> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 Name: x, dtype: int32, b = y 1.0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-3-0-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e708> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 Name: x, dtype: int32 b = y 1.0 2 2.0 1 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-3-0-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e708> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 Name: x, dtype: int32 b = y 1.0 2 2.0 1 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-3-1-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e708> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 Name: x, dtype: int32, b = y 1.0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-3-1-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e708> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 Name: x, dtype: int32, b = y 1.0 1 Name: x, dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-3-1-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e708> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 Name: x, dtype: int32 b = y 1.0 2 2.0 1 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[0-3-1-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e708> indexer = at 0xb2b7e758> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 Name: x, dtype: int32 b = y 1.0 2 2.0 1 Name: x, dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-0-0-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y -overlapped-index-name-0 2.0 1 1 b = x y -overlapped-index-name-0 2.0 1 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-0-0-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 2.0 1 dtype: int32, b = x 2.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-0-0-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y -overlapped-index-name-0 0.0 1 1 1.0 1 1 2.0 1 1 b = x y -overlapped-index-name-0 0.0 1 1 1.0 1 1 2.0 1 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-0-0-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 1 1.0 1 2.0 1 dtype: int32 b = x 0.0 1 1.0 1 2.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-0-1-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y -overlapped-index-name-0 2.0 1 1 b = x y -overlapped-index-name-0 2.0 1 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-0-1-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 2.0 1 dtype: int32, b = x 2.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-0-1-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y -overlapped-index-name-0 0.0 1 1 1.0 1 1 2.0 1 1 b = x y -overlapped-index-name-0 0.0 1 1 1.0 1 1 2.0 1 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-0-1-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 1 1.0 1 2.0 1 dtype: int32 b = x 0.0 1 1.0 1 2.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-1-0-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 1, b = x y 1.0 1, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-1-0-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 dtype: int32, b = y 1.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-1-0-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 2 2.0 1, b = x y 1.0 2 2.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-1-0-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 dtype: int32, b = y 1.0 2 2.0 1 dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-1-1-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 1, b = x y 1.0 1, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-1-1-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 dtype: int32, b = y 1.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-1-1-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 2 2.0 1, b = x y 1.0 2 2.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-1-1-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 dtype: int32, b = y 1.0 2 2.0 1 dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-2-0-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0 1 1, b = x y 0 1 1, check_names = True, check_dtype = True check_divisions = True, check_index = True, sort_results = True scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-2-0-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 1 dtype: int32, b = 0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-2-0-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0 2 2 1 1 1, b = x y 0 2 2 1 1 1, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-2-0-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 2 1 1 dtype: int32, b = 0 2 1 1 dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-2-1-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0 1 1, b = x y 0 1 1, check_names = True, check_dtype = True check_divisions = True, check_index = True, sort_results = True scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-2-1-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 1 dtype: int32, b = 0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-2-1-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 0 2 2 1 1 1, b = x y 0 2 2 1 1 1, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-2-1-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 2 1 1 dtype: int32, b = 0 2 1 1 dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-3-0-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 1, b = x y 1.0 1, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-3-0-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 dtype: int32, b = y 1.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-3-0-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 2 2.0 1, b = x y 1.0 2 2.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-3-0-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 dtype: int32, b = y 1.0 2 2.0 1 dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-3-1-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 1, b = x y 1.0 1, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-3-1-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 dtype: int32, b = y 1.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[1-3-1-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 2 2.0 1, b = x y 1.0 2 2.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[1-3-1-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7a8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 dtype: int32, b = y 1.0 2 2.0 1 dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-0-0-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x -overlapped-index-name-0 2.0 1 b = x -overlapped-index-name-0 2.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-0-0-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 2.0 1 dtype: int32, b = x 2.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-0-0-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x -overlapped-index-name-0 0.0 1 1.0 1 2.0 1 b = x -overlapped-index-name-0 0.0 1 1.0 1 2.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-0-0-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 1 1.0 1 2.0 1 dtype: int32 b = x 0.0 1 1.0 1 2.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-0-1-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x -overlapped-index-name-0 2.0 1 b = x -overlapped-index-name-0 2.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-0-1-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 2.0 1 dtype: int32, b = x 2.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-0-1-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x -overlapped-index-name-0 0.0 1 1.0 1 2.0 1 b = x -overlapped-index-name-0 0.0 1 1.0 1 2.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-0-1-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e618> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0.0 1 1.0 1 2.0 1 dtype: int32 b = x 0.0 1 1.0 1 2.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-1-0-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 1, b = x y 1.0 1, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-1-0-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 dtype: int32, b = y 1.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-1-0-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 2 2.0 1, b = x y 1.0 2 2.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-1-0-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 dtype: int32, b = y 1.0 2 2.0 1 dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-1-1-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 1, b = x y 1.0 1, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-1-1-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 dtype: int32, b = y 1.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-1-1-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 2 2.0 1, b = x y 1.0 2 2.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-1-1-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e668> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 dtype: int32, b = y 1.0 2 2.0 1 dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-2-0-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0 1, b = x 0 1, check_names = True, check_dtype = True check_divisions = True, check_index = True, sort_results = True scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-2-0-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 1 dtype: int32, b = 0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-2-0-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0 2 1 1, b = x 0 2 1 1, check_names = True, check_dtype = True check_divisions = True, check_index = True, sort_results = True scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-2-0-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 2 1 1 dtype: int32, b = 0 2 1 1 dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-2-1-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0 1, b = x 0 1, check_names = True, check_dtype = True check_divisions = True, check_index = True, sort_results = True scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-2-1-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 1 dtype: int32, b = 0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-2-1-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x 0 2 1 1, b = x 0 2 1 1, check_names = True, check_dtype = True check_divisions = True, check_index = True, sort_results = True scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-2-1-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e6b8> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = 0 2 1 1 dtype: int32, b = 0 2 1 1 dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-3-0-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 1, b = x y 1.0 1, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-3-0-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e578> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 dtype: int32, b = y 1.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-3-0-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 2 2.0 1, b = x y 1.0 2 2.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-3-0-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e578> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 dtype: int32, b = y 1.0 2 2.0 1 dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-3-1-1-2] ______ func = at 0xb2b7e3e8>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 1, b = x y 1.0 1, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-3-1-1-3] ______ func = at 0xb2b7e438>, n = 1 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 1 dtype: int32, b = y 1.0 1 dtype: int64, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError _____ test_groupby_windowing_n[2-3-1-4-2] ______ func = at 0xb2b7e3e8>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = x y 1.0 2 2.0 1, b = x y 1.0 2 2.0 1 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="x") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError _____ test_groupby_windowing_n[2-3-1-4-3] ______ func = at 0xb2b7e438>, n = 4 getter = at 0xb2b7e5c8> grouper = at 0xb2b7e708> indexer = at 0xb2b7e7f8> @pytest.mark.parametrize('func', [ lambda x: x.sum(), lambda x: x.mean(), lambda x: x.count(), lambda x: x.size(), lambda x: x.var(ddof=1), lambda x: x.std(ddof=1), pytest.param(lambda x: x.var(ddof=0), marks=pytest.mark.xfail), ]) @pytest.mark.parametrize('n', [1, 4]) @pytest.mark.parametrize('getter', [ lambda df: df, lambda df: df.x, ]) @pytest.mark.parametrize('grouper', [ lambda a: a.x % 3, lambda a: 'y', lambda a: a.index % 2, lambda a: ['y'] ]) @pytest.mark.parametrize('indexer', [ lambda g: g.x, lambda g: g, lambda g: g[['x']], #lambda g: g[['x', 'y']] ]) def test_groupby_windowing_n(func, n, getter, grouper, indexer): df = pd.DataFrame({'x': np.arange(10, dtype=float), 'y': [1.0, 2.0] * 5}) sdf = DataFrame(example=df) def f(x): return func(indexer(x.groupby(grouper(x)))) L = f(sdf.window(n=n)).stream.gather().sink_to_list() diff = 3 for i in range(0, 10, diff): sdf.emit(df.iloc[i: i + diff]) sdf.emit(df.iloc[:0]) assert len(L) == 5 first = df.iloc[max(0, diff - n): diff] > assert_eq(L[0], f(first)) streamz/dataframe/tests/test_dataframes.py:900: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = y 1.0 2 2.0 1 dtype: int32, b = y 1.0 2 2.0 1 dtype: int64 check_names = True, check_dtype = True, check_divisions = True check_index = True, sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs ) elif isinstance(a, pd.Series): > tm.assert_series_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of Series are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:574: AssertionError ________________ test_groupby_aggregate_with_start_state[core] _________________ stream = def test_groupby_aggregate_with_start_state(stream): example = pd.DataFrame({'name': [], 'amount': []}) sdf = DataFrame(stream, example=example).groupby(['name']) output0 = sdf.amount.sum(start=None).stream.gather().sink_to_list() output1 = sdf.amount.mean(with_state=True, start=None).stream.gather().sink_to_list() output2 = sdf.amount.count(start=None).stream.gather().sink_to_list() df = pd.DataFrame({'name': ['Alice', 'Tom'], 'amount': [50, 100]}) stream.emit(df) out_df0 = pd.DataFrame({'name': ['Alice', 'Tom'], 'amount': [50.0, 100.0]}) out_df1 = pd.DataFrame({'name': ['Alice', 'Tom'], 'amount': [1, 1]}) assert assert_eq(output0[0].reset_index(), out_df0) assert assert_eq(output1[0][1].reset_index(), out_df0) > assert assert_eq(output2[0].reset_index(), out_df1) streamz/dataframe/tests/test_dataframes.py:1004: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = name amount 0 Alice 1 1 Tom 1 b = name amount 0 Alice 1 1 Tom 1, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 1] (column name="amount") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ________________ test_groupby_aggregate_with_start_state[dask] _________________ stream = def test_groupby_aggregate_with_start_state(stream): example = pd.DataFrame({'name': [], 'amount': []}) sdf = DataFrame(stream, example=example).groupby(['name']) output0 = sdf.amount.sum(start=None).stream.gather().sink_to_list() output1 = sdf.amount.mean(with_state=True, start=None).stream.gather().sink_to_list() output2 = sdf.amount.count(start=None).stream.gather().sink_to_list() df = pd.DataFrame({'name': ['Alice', 'Tom'], 'amount': [50, 100]}) stream.emit(df) out_df0 = pd.DataFrame({'name': ['Alice', 'Tom'], 'amount': [50.0, 100.0]}) out_df1 = pd.DataFrame({'name': ['Alice', 'Tom'], 'amount': [1, 1]}) assert assert_eq(output0[0].reset_index(), out_df0) assert assert_eq(output1[0][1].reset_index(), out_df0) > assert assert_eq(output2[0].reset_index(), out_df1) streamz/dataframe/tests/test_dataframes.py:1004: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ a = name amount 0 Alice 1 1 Tom 1 b = name amount 0 Alice 1 1 Tom 1, check_names = True check_dtype = True, check_divisions = True, check_index = True sort_results = True, scheduler = 'sync', kwargs = {} def assert_eq( a, b, check_names=True, check_dtype=True, check_divisions=True, check_index=True, sort_results=True, scheduler="sync", **kwargs, ): if check_divisions: assert_divisions(a, scheduler=scheduler) assert_divisions(b, scheduler=scheduler) if hasattr(a, "divisions") and hasattr(b, "divisions"): at = type(np.asarray(a.divisions).tolist()[0]) # numpy to python bt = type(np.asarray(b.divisions).tolist()[0]) # scalar conversion assert at == bt, (at, bt) assert_sane_keynames(a) assert_sane_keynames(b) a = _check_dask( a, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) b = _check_dask( b, check_names=check_names, check_dtypes=check_dtype, scheduler=scheduler ) if hasattr(a, "to_pandas"): a = a.to_pandas() if hasattr(b, "to_pandas"): b = b.to_pandas() if isinstance(a, (pd.DataFrame, pd.Series)) and sort_results: a = _maybe_sort(a, check_index) b = _maybe_sort(b, check_index) if not check_index: a = a.reset_index(drop=True) b = b.reset_index(drop=True) if isinstance(a, pd.DataFrame): > tm.assert_frame_equal( a, b, check_names=check_names, check_dtype=check_dtype, **kwargs E AssertionError: Attributes of DataFrame.iloc[:, 1] (column name="amount") are different E E Attribute "dtype" are different E [left]: int32 E [right]: int64 /usr/lib/python3/dist-packages/dask/dataframe/utils.py:570: AssertionError ____________________________ test_delay_ref_counts _____________________________ def test_func(): with pristine_loop() as loop: cor = gen.coroutine(func) try: > loop.run_sync(cor, timeout=timeout) streamz/utils_test.py:70: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/lib/python3/dist-packages/tornado/ioloop.py:529: in run_sync return future_cell[0].result() /usr/lib/python3/dist-packages/tornado/gen.py:782: in run yielded = self.gen.send(value) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ @gen_test() def test_delay_ref_counts(): source = Stream(asynchronous=True) _ = source.delay(0.01) refs = [] for i in range(5): r = RefCounter() refs.append(r) source.emit(i, metadata=[{'ref': r}]) assert all(r.count == 1 for r in refs) yield gen.sleep(0.05) > assert all(r.count == 0 for r in refs) E assert False E + where False = all(. at 0xb0664098>) streamz/tests/test_core.py:547: AssertionError =============================== warnings summary =============================== streamz/tests/test_dask.py:244 /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/tests/test_dask.py:244: PytestUnknownMarkWarning: Unknown pytest.mark.asyncio - is this a typo? You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html @pytest.mark.asyncio .pybuild/cpython3_3.11_streamz/build/streamz/dataframe/tests/test_dataframes.py: 5 warnings .pybuild/cpython3_3.11_streamz/build/streamz/tests/test_batch.py: 1 warning .pybuild/cpython3_3.11_streamz/build/streamz/tests/test_core.py: 5 warnings .pybuild/cpython3_3.11_streamz/build/streamz/tests/test_dask.py: 4 warnings .pybuild/cpython3_3.11_streamz/build/streamz/tests/test_sources.py: 3 warnings /usr/lib/python3/dist-packages/tornado/ioloop.py:350: DeprecationWarning: make_current is deprecated; start the event loop first self.make_current() .pybuild/cpython3_3.11_streamz/build/streamz/dataframe/tests/test_dataframes.py::test_identity[core] /usr/lib/python3/dist-packages/distributed/utils.py:165: RuntimeWarning: Couldn't detect a suitable IP address for reaching '8.8.8.8', defaulting to hostname: [Errno 101] Network is unreachable warnings.warn( .pybuild/cpython3_3.11_streamz/build/streamz/dataframe/tests/test_dataframes.py: 8 warnings .pybuild/cpython3_3.11_streamz/build/streamz/tests/test_core.py: 40 warnings .pybuild/cpython3_3.11_streamz/build/streamz/tests/test_sources.py: 6 warnings /usr/lib/python3/dist-packages/tornado/ioloop.py:227: DeprecationWarning: clear_current is deprecated IOLoop.clear_current() .pybuild/cpython3_3.11_streamz/build/streamz/dataframe/tests/test_dataframes.py: 4 warnings .pybuild/cpython3_3.11_streamz/build/streamz/tests/test_core.py: 20 warnings .pybuild/cpython3_3.11_streamz/build/streamz/tests/test_sources.py: 3 warnings /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/utils_test.py:47: DeprecationWarning: clear_current is deprecated IOLoop.clear_current() .pybuild/cpython3_3.11_streamz/build/streamz/dataframe/tests/test_dataframes.py: 4 warnings .pybuild/cpython3_3.11_streamz/build/streamz/tests/test_core.py: 20 warnings .pybuild/cpython3_3.11_streamz/build/streamz/tests/test_sources.py: 3 warnings /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/utils_test.py:49: DeprecationWarning: make_current is deprecated; start the event loop first loop.make_current() .pybuild/cpython3_3.11_streamz/build/streamz/dataframe/tests/test_dataframes.py: 4 warnings .pybuild/cpython3_3.11_streamz/build/streamz/tests/test_core.py: 20 warnings .pybuild/cpython3_3.11_streamz/build/streamz/tests/test_sources.py: 3 warnings /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/utils_test.py:55: DeprecationWarning: clear_current is deprecated IOLoop.clear_current() .pybuild/cpython3_3.11_streamz/build/streamz/dataframe/tests/test_dataframes.py::test_windowing_n[1-1-4] .pybuild/cpython3_3.11_streamz/build/streamz/dataframe/tests/test_dataframes.py::test_windowing_n[1-1-5] /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build/streamz/dataframe/aggregations.py:99: RuntimeWarning: invalid value encountered in scalar divide result = result * n / (n - self.ddof) .pybuild/cpython3_3.11_streamz/build/streamz/tests/test_dask.py::test_stream_shares_client_loop /usr/lib/python3/dist-packages/_pytest/python.py:184: PytestUnhandledCoroutineWarning: async def functions are not natively supported and have been skipped. You need to install a suitable plugin for your async framework, for example: - anyio - pytest-asyncio - pytest-tornasync - pytest-trio - pytest-twisted warnings.warn(PytestUnhandledCoroutineWarning(msg.format(nodeid))) -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ===Flaky Test Report=== test_tcp passed 1 out of the required 1 times. Success! test_tcp_async passed 1 out of the required 1 times. Success! ===End Flaky Test Report=== =========================== short test summary info ============================ FAILED streamz/dataframe/tests/test_dataframes.py::test_dataframe_simple[1] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[core-0-0-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[core-0-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[core-0-2-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[core-0-3-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[core-1-0-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[core-1-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[core-1-2-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[core-1-3-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[core-2-0-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[core-2-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[core-2-2-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[core-2-3-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[dask-0-0-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[dask-0-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[dask-0-2-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[dask-0-3-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[dask-1-0-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[dask-1-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[dask-1-2-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[dask-1-3-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[dask-2-0-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[dask-2-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[dask-2-2-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate[dask-2-3-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_value_counts[core] - ... FAILED streamz/dataframe/tests/test_dataframes.py::test_value_counts[dask] - ... FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-0-0-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-0-0-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-0-1-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-0-1-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-1-0-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-1-0-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-1-1-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-1-1-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-2-0-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-2-0-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-2-1-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-2-1-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-3-0-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-3-0-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-3-1-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[0-3-1-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-0-0-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-0-0-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-0-1-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-0-1-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-1-0-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-1-0-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-1-1-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-1-1-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-2-0-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-2-0-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-2-1-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-2-1-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-3-0-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-3-0-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-3-1-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[1-3-1-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-0-0-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-0-0-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-0-1-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-0-1-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-1-0-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-1-0-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-1-1-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-1-1-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-2-0-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-2-0-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-2-1-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-2-1-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-3-0-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-3-0-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-3-1-10h-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_value[2-3-1-1d-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-0-0-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-0-0-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-0-0-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-0-0-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-0-1-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-0-1-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-0-1-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-0-1-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-1-0-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-1-0-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-1-0-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-1-0-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-1-1-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-1-1-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-1-1-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-1-1-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-2-0-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-2-0-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-2-0-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-2-0-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-2-1-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-2-1-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-2-1-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-2-1-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-3-0-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-3-0-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-3-0-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-3-0-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-3-1-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-3-1-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-3-1-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[0-3-1-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-0-0-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-0-0-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-0-0-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-0-0-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-0-1-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-0-1-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-0-1-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-0-1-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-1-0-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-1-0-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-1-0-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-1-0-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-1-1-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-1-1-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-1-1-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-1-1-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-2-0-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-2-0-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-2-0-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-2-0-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-2-1-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-2-1-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-2-1-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-2-1-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-3-0-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-3-0-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-3-0-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-3-0-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-3-1-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-3-1-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-3-1-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[1-3-1-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-0-0-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-0-0-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-0-0-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-0-0-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-0-1-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-0-1-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-0-1-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-0-1-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-1-0-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-1-0-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-1-0-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-1-0-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-1-1-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-1-1-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-1-1-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-1-1-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-2-0-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-2-0-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-2-0-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-2-0-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-2-1-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-2-1-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-2-1-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-2-1-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-3-0-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-3-0-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-3-0-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-3-0-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-3-1-1-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-3-1-1-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-3-1-4-2] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_windowing_n[2-3-1-4-3] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate_with_start_state[core] FAILED streamz/dataframe/tests/test_dataframes.py::test_groupby_aggregate_with_start_state[dask] FAILED streamz/tests/test_core.py::test_delay_ref_counts - assert False = 174 failed, 848 passed, 442 skipped, 9 xfailed, 99 xpassed, 158 warnings in 244.51s (0:04:04) = E: pybuild pybuild:388: test: plugin distutils failed with: exit code=1: cd /build/reproducible-path/python-streamz-0.6.4/.pybuild/cpython3_3.11_streamz/build; python3.11 -m pytest dh_auto_test: error: pybuild --test --test-pytest -i python{version} -p 3.11 returned exit code 13 make: *** [debian/rules:8: binary] Error 25 dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2 I: copying local configuration E: Failed autobuilding of package I: unmounting dev/ptmx filesystem I: unmounting dev/pts filesystem I: unmounting dev/shm filesystem I: unmounting proc filesystem I: unmounting sys filesystem I: cleaning the build env I: removing directory /srv/workspace/pbuilder/7484 and its subdirectories Tue Jan 23 23:56:06 UTC 2024 W: No second build log, what happened?