About Us

The first Italian Python event for Pythonistas and Data Geeks.

PyData Logo

PyData

Starting out with a PyData Workshop at the Googleplex in Mountain View, CA, in 2012 PyData has evolved into a successful conference series on using Python for the management, processing, analysis, and visualization of data. Alongside the popular conferences and the excellent PyData tools/packages a steadily growing PyData community has formed.
Visit PyData website or search for #pydata at Twitter to find out more.

PyCon Sei Logo

PyData Florence @ PyCon Sei

The PyData Florence event will be hosted at the PyCon Sei Conference, that is the sixth edition of the PyCon Italia Conference. This year PyCon Sei will provide a set of dedicated tracks and talks to different sub-communities, and PyData will be one of those! PyCon Sei will be held in Florence, which is renowned for its vibrant community of Pythonistas. PyData Florence would provide a meeting place where data scientists and engineers could join efforts, and support building a strong Italian PyData community.
Signup to the PyData Italy mailing list to be updated on PyData Florence and on future events and meetups!

NumFocus Logo

Num Focus

The PyData conference series is organized and supported by NumFOCUS, a non-profit organization which promotes world-class, innovative, open source scientific software. NumFOCUS aims to ensure that money is available to keep projects in the scientific Python stack funded and available. So if you find value in these tools and have always wanted to give back, donating to NumFOCUS gives you a way of supporting either a specific project of your choice or all of these great codes at once.
NumFOCUS website

Conference

The PyData Community

PyData Conference Mission

PyData is a gathering of users and developers of data analysis tools in Python. The goals are to provide Python enthusiasts a place to share ideas and learn from each other about how best to apply our language and tools to ever-evolving challenges in the vast realm of data management, processing, analytics, and visualization.

Read the full mission statement

PyData@PyConSei, April 17-19 2015

The third PyData conference outside the US after PyData London 2014 and PyData Berlin 2014, and the first to be held in the 2015 will take place in Florence.
PyData Florence will be hosted at PyCon Sei , the sixth edition of the Italian Python Conference. PyCon Sei will provide a special track entirely dedicated to data scientists and engineers who are willing to share their knowledge about data tools and applications with the Python community. Accepted contributions to the PyData@PyConSei track will be regular talks, and trainings.
For further information, please visit the Call for Proposals page on the PyCon Sei official website.

Joining the Online PyData community

Some tips for getting involved with the community are to sign up for an account to contribute blog posts, submit talks, or volunteer for the next organizing committee for an upcoming conference. This website and the website at pydata.org is a resource for our community.

News

The latest updates about PyData and PyData@PyConSei

Early-bird Deadline

The Early-Bird fare for tickets will end on February, 28 2015.
More info are available on the PyCon Sei official website: Registration

The Talk voting is Open

The Community Voting for the talks submitted to PyData@PyConSei is now open, and will last until February, 11 2015 23:59:59 CET.
All the attendees (who already purchase their tickets), and everyone who submitted a proposal are now eligible to cast their votes for all the proposals submitted to PyCon Sei.
Cast your vote, and Be part of the Conference!

Thank you!

Thanks a lot to everyone who submitted a talk proposal to PyData@PyConSei!

Call of Proposals: Deadline Extended

The Deadline to submit a proposal to the PyData track has been extended to February, 09 2015.
More info are available on the PyCon Sei official website: Call for Proposals

Schedule

Accepted Talks and Events

Day 1: Friday, 17th April 2015

  • Registration

    Friday, 17 April 2015 - 09:00

  • Opening: A New Model of PyCon

    Friday, 17 April 2015 - 09:30

  • Alex Martelli

    Speaker: Alex Martelli

    Friday, 17 April 2015 - 10:00

    View Details

    As Python grows, the problem spaces we address keep shifting, and best practices for software development mature, so does the set of best-of-breed patterns and idioms change: some classics fading, new stars emerging. Python itself has grown to encompass some classic idioms, such as Decorate-Sort-Undecorate, AKA DSU, begetting the widespread **key=** argument to most functions related to ordering -- but not quite all of them: `heapq`, for example, still mostly lacks `key=` -- so, we also show what idioms to use with this and similar modules. Lists have long been one of Python's strengths, and they're of course still precious -- but many kinds of specialized containers have emerged, and it's important to know how to choose among them, and when and how to roll your own. More important still, *iterators* have grown into prominence, and very often they'll be the best choice -- and they come with a large set of relevant patterns and idioms. The tectonic shift that's taking us from classic to modern Python goes even deeper -- even the dominance of good old *duck typing* is threatened! Specifically, in many cases, we use, instead, **goose** typing -- checking against an abstract base class -- and, as type annotations slowly emerge, they reinforce this general tendency. These, and a miscellanea of smaller patterns and idioms (concerning I/O, best uses of *dict*s and other specialized mappings, async operations, testing, ...), are fast becoming indispensable parts of the Proficient Pythonista's repertoire, and this talk helps fill the gap between yesterday's good old Python, and tomorrow's glittering vistas.

  • Light Refreshments

    Friday, 17 April 2015 - 10:30

  • Davide Setti

    Speaker: Davide Setti

    Friday, 17 April 2015 - 11:15

    View Details

    Una vita idilliaca tra list comprehension e generator, senza preoccuparti troppo dei tipi. Sei felice. E poi arriva lui. Hadoop. Ma tu non devi abbatterti. C'è speranza. Come sopravvivere in un ecosistema Java? L'obiettivo di questo talk è di introdurre Hadoop e come è possibile utilizzare Python, quando ha senso farlo. Verranno introdotti: - Hadoop MapReduce e HDFS - Hadoop Streaming - Pig - YARN - Spark Non si vuole proporre un uso forzato di Python dove non ha senso, ma solo suggerire in base all'esperienza maturata quando è utile e quali sono i limiti.

  • Valerio Maggio

    Speaker: Valerio Maggio

    Friday, 17 April 2015 - 12:00

    View Details

    In the last years, the Python programming language has engaged a major renovation, and *Python 3.x* is going to be the next generation reference of the language. However, despite the community strives to support the language switch among pythonistas, Python 3 is not yet the real *de-facto* reference version of the language and *Python 2.x* is still on the go. Moreover, as for the scientific Python community, only few years ago most of the scientific python packages only (or mainly) supported `Python 2`. In this talk we will analyse how far the support to Python 3 extent to the Python scientific packages, emphasising what actually works and what doesn't. The general claim of this talk is that, from a technical perspective, Python 3 is a more mature language from many point of view (w.r.t. Python 2), and the scientific community is now able more than ever to switch to Python 3. However, so far, the switch has not been done yet. Thus, some reflections on this "particular" community of pythonistas will conclude the talk in order to stimulate the discussion and derive together possible solutions and "workaround" to support this change.

  • Bence Faludi

    Speaker: Bence Faludi

    Friday, 17 April 2015 - 12:30

    View Details

    mETL is an ETL package written in Python. Program can be used to load practically any kind of data to any target. Code is open source and available for anyone who want to use it. The main advantage to configurable via Yaml files and You have the possibility to write any transformation in Python and You can use it natively from any framework as well. We are using this tool in production for many of our clients and It is really stable and reliable. The project has a few contributors all around the world right now and I hope many developer will join soon. I want to introduce you this tool. In this presentation I will show you the functionality and the common use cases. Furthermore I will talk about other ETL tools in Python.

  • Lunch

    Friday, 17 April 2015 - 13:00

  • Christian Barra

    Speaker: Christian Barra

    Friday, 17 April 2015 - 14:30

    View Details

    Scopo di questo talk è di introdurvi nel fantastico mondo del **Kung Fu**....con Pandas ! Parliamo di cosa è Pandas, a cosa puó servirvi e di come utilizzarlo con alcuni esempi pratici. Un potente strumento per gestire ed analizzare i vostri dati ! Prerequisiti: Conoscenza di Python Goals: Utilizzo di Pandas come strumento per elaborare e analizzare dati

  • Fabio Pliger

    Speaker: Fabio Pliger

    Friday, 17 April 2015 - 15:30

    View Details

    Bokeh is a Python interactive visualization library for large datasets that natively uses the latest web technologies. Its goal is to provide elegant, concise construction of novel graphics in the style of Protovis/D3, while delivering high-performance interactivity over large data to thin clients. The talk will go through it's design providing details of the different API layers (bottom to top) concluding with a comprehensive showcase of examples to expose many of the features that make Bokeh so powerful and easy.

  • Coffee Break

    Friday, 17 April 2015 - 16:30

  • Luca Mearelli

    Speaker: Luca Mearelli

    Friday, 17 April 2015 - 17:15

    View Details

    Every conversation has a structure. The network formed by the persons who interact in the conversations inside an online community can thus be analysed with the tools of the network science to understand their characteristics. Enriching online conversations with social network analysis we hope to be able to give community moderators tools that could help them to build better communities and guide the collective intelligence processes inside them. The great vision in this is to be able to contribute to building the engines that could help us to build effective platforms for participatory democracy ( a better overview on these theme is explained by my colleague Alberto Cottica in [this TEDx talk: https://www.youtube.com/watch?v=KKrM2c-ww_k][1] ) Edgesense is our attempt at making tools for social network analysis accessible to all those that could gain the most from them but maybe did not yet know their potential. Edgesense is built around a set of scripts that process the community data and extract the most relevant network metrics and the network structure and a dashboard that is able to present them giving a clear view of them. Through the dashboard community managers can for the first time see in a glimpse who is talking to who, which users are central to the community and those who are on the periphery. They can see which sub-communities are developing inside the larger online conversation and who is acting as a bridge between them.This is very useful to guide the conversation or to determine which users have the most authority (through measures of centrality or "pagerank") Python has been central to the development of Edgesense it has made possible to create a processing pipeline that takes the data from various sources (Drupal community sites, twitter conversations, mailing lists data) and builds the network of interactions among the users. Python has also enabled us to choose from a very rich library of social network analysis algorithms to calculate the metrics that we are presenting in the dashboard to help understand the social interactions. During this talk we'll look at which choices we were required when building a data processing application that takes real-life data, extracts useful informations and prepares t for visualisation. We'll see a primer on the network science required to interpret the structure of the users community and how the algorithms and the metrics were chosen. We'll also present the challenges that we are facing to extend Edgesense to ever larger communities. The presentation will be useful to both developers interested in using python to do social network analysis and to those already skilled in it which could find inspiration in the problems we are trying to solve with Edgesense. Edgesense has been developed by [Wikitalia][2] within the [CATALYST EC project][3] on collective intelligence and it is available as open source software: [https://github.com/Wikitalia/edgesense][4] (an example of a live dashboard is available here: [http://matera2019.edgesense.spazidigitali.com/][5] ) [1]: https://www.youtube.com/watch?v=KKrM2c-ww_k [2]: http://www.wikitalia.it [3]: http://catalyst-fp7.eu/ [4]: https://github.com/Wikitalia/edgesense [5]: http://matera2019.edgesense.spazidigitali.com/

  • Peadar Coyle

    Speaker: Peadar Coyle

    Friday, 17 April 2015 - 18:00

    View Details

    One of the biggest challenges we have as data scientists is getting our models into production. I've worked with Java developers to get models into production and there aren't always the same libraries in Java as there are in Python. Example try porting Scikitlearn code to Java. Possible solution: PMML (from last time) or you write spec. An even better solution: I will explain how to use Science Ops from YhatHQ to build better data products. Specifically I will talk about how to use a Python, Pandas etc to build a model. Test it locally and then deploy it so thatdevelopers can get an easy to use RESTful API. I will remark some of my experiences from working with it, and give a use case and some architectural remarks. I'll also give a run down of alternatives to Science Ops that I've found. Pre Requisites - some experience with Pandas and the scientific Python would be beneficial. This talk is aimed at Data Science enthusiasts or professionals.

  • PyData Community Meeting

    Friday, 17 April 2015 - 18:45

  • Social Event: PyBeer

    Friday, 17 April 2015 - 21:30

Back to the Top

Day 2: Saturday, 18th April 2015

  • Registration

    Saturday, 18 April 2015 - 08:00

  • Recruiting Session

    Saturday, 18 April 2015 - 09:30

  • Light Refreshments

    Saturday, 18 April 2015 - 10:30

  • Radim Řehůřek

    Speaker: Radim Řehůřek

    Saturday, 18 April 2015 - 11:15

    View Details

    With the multitude of data mining tools coming out in the data science world, Python is one choice of many. How does it compare? This talk looks into some pragmatic aspects of its (data mining) ecosystem, its baggage and its future.

  • Valerio Maggio

    Speaker: Valerio Maggio

    Saturday, 18 April 2015 - 12:15

    View Details

    Machine learning is an amazing research and application field, which perfectly matches math skills with coding abilities in order to define *programs that are able to learn from data*. Therefore, after having defined our own (mathematical) model, machine learning is about writing code - sometimes a lot of - to actually make the model to work. However, one point usually underestimated or omitted when dealing with machine learning algorithms is how to write *good quality* code. Test-driven development (TDD) is one of the most popular agile methods, specifically designed to support developers in producing (potential) less-buggy code by writing tests before the actual code *under test*. The application of test-first programming principles to the implementation of *Naive Bayes classifiers* or *Neural networks* looks like a daunting challenge. Conversely, the `test-code-refactor` cycle strategy founds its principles in the scientific method: make a proposition of validity, share results, work in feedback loops. Moreover, this kind of approach to tackle problems, in this particular case would also allow for a better understanding of how the whole learning model works under the hood. In this talk, examples of Test-Driven implementations of some of the most famous machine learning algorithms will be presented using `scikit-learn`. The talk is intended for an *intermediate* audience. The content of the talk is intended to be mostly practical, and code oriented. Thus a good proficiency with the Python language is **required**. Conversely, **no prior knowledge** about TDD nor Machine Learning algorithms is necessary to attend this talk.

  • Lunch

    Saturday, 18 April 2015 - 13:00

  • Francesco Cavazzana

    Speaker: Francesco Cavazzana

    Saturday, 18 April 2015 - 14:30

    View Details

    Il talk propone di mostrare un caso concreto di utilizzo di python, con varie librerie, per l’analisi di dati in formato aperto in ambito economico. Tutti gli enti pubblici italiani per legge devono pubblicare un file xml liberamente accessibile con tutti i dettagli sulla spesa per acquisti di lavori, beni e servizi. Questa mole di dati può dare informazioni molto interessanti per il controllo della spesa pubblica: benchmark per gli operatori del settore, controllo diffuso sulla spesa pubblica da parte della cittadinanza, prevenzione della corruzione, ma anche analisi di mercato e confronto con i concorrenti per aziende fornitori della pubblica amministrazione. Per analizzare questa mole di dati, pubblica ormai da un anno, ma ancora poco utilizzata, ho utilizzato python per scaricare e interpretare i file xml, organizzare i dati raccolti in un database, analizzarli con indici, grafici, confronti di settore, variazione nel tempo, analisi di regressione. Le analisi sono state condotte principalmente utilizzando scipy (con anche un esempio di calcolo parallelo multiprocesso). Ho inoltre utilizzato il software statistico R tramite rpy2 per alcune analisi e per la generazione di grafici. La presentazione delle analisi è realizzata in excel con xlsxwriter, con molti grafici disegnati direttamente da excel, ma ovviamente parametrizzati in python. Per l’interpretazione dell’xml, qualche pagina web per la gestione e per tenere insieme il tutto ho utilizzato genropy. Il talk può essere interessante per mostrare come le librerie python per il calcolo scientifico possano essere utilizzate velocemente e con un tempo di apprendimento veramente minimo anche non da statistici o matematici, ma da un economista come me, per ottenere risultati magari semplici per gli esperti del settore (ho appena sfiorato la superficie delle potenzialità di questi strumenti), ma comunque strabilianti per chi è abituato a fare i conti solo con excel. Prerequisiti per una piena comprensione sono una conoscenza normale di python e un'idea di cos'è l'XML. Per quanto riguarda le librerie scipy e rpy2 il talk è assolutamente introduttivo, un esempio di come può usarle con frutto anche un totale principiante come me.

  • Valerio Maggio

    Speaker: Valerio Maggio

    Saturday, 18 April 2015 - 15:00

    View Details

    **Machine Learning** focuses on *constructing algorithms for making predictions from data*. These algorithms usually require *huge* amount of data to analyse, thus demanding for high computational costs and requiring easily scalable solutions to be effectively applied. These factors favoured a more and more increasing interest in *scaling up* machine learning applications. [**Scikit-learn**](http://scikit-learn.org/stable/) is one of the most popular machine learning library in Python, providing implementations for several machine learning methods, along with datasets and (performance) evaluation algorithms. In this talk some recipes to scale up machine learning algorithms with scikit-learn will be presented. The talk will go over several examples and case studies that will be presented in a *problem-to-solution* way in order to likely engage discussions during and after the talk The talk is intended for an intermediate level of audience. It requires (very) basic math skills and a good knowledge of the Python language. Good knowledge of the `numpy` and `scipy` packages is also a plus.

  • Fabio Pliger

    Speaker: Fabio Pliger

    Saturday, 18 April 2015 - 15:45

    View Details

    Bokeh e' un [nuovo] framework grafico per la visualizzazione di grandi dataset [Python (e non solo!)] con supporto nativo per le piú recenti tecnologie web. Lo scopo di questo ampio ed ambizioso progetto e' di fornire uno strumento elegante e sintetico per la costruzione di infografiche, charts e interfacce interattive HTML nello stile di Protovis/D3 e allo tempo fornendo iterattivita' di altissime prestazioni su enormi quantità di dati. Il progetto, arrivando alla versione 0.8, ha raggiunto livelli di funzionalità e stabilità tali da destare enorme interesse e l'utilizzo non solo nella communità scientifica, tanto da essere scelto da Facebook come backend per la creazione di plot e immagini su iTorch. Il talk presenterà la particolare architettura di Bokeh, sulle varie feature e API, concludendo, infine, con una serie di esempi pratici di applicazioni e piccoli script cercando di esporre il più possibile le sue particolarità e potenzialità.

  • Coffee Break

    Saturday, 18 April 2015 - 16:30

  • Ivan Rossi

    Speaker: Ivan Rossi

    Saturday, 18 April 2015 - 17:15

    View Details

    The talk will briefly describe BioDec's genome annotation pipeline, which has been internally evolved for ten years to annotate mammalian, bacterial, and viral genomes at the protein level. Everything we built is coded in Python: the machine-learning annotation engine, the middleware and the data visualization interface. A software that would have been extremely difficult to create, for a tiny shop as BioDec, without leveraging the language itself and projects such as BioPython, Scipy, Plone, and Web2py. We will try to distill some of the lessons that we learned from more than ten years of experience dealing with machine-learning-intensive applications and large data sets with open-source software and Python.

  • Francisco Fernández Castaño
  • Ezio Melotti

    Speaker: Ezio Melotti

    Saturday, 18 April 2015 - 18:15

    View Details

    The European MaRs Analogue Station for Advanced Technologies Integration (ERAS) is a program spearheaded by the Italian Mars Society (IMS), whose main goal is to provide an effective test bed for field operation studies in preparation for human missions to Mars. Preliminarily to its construction, IMS has started the development of an immersive Virtual Reality (VR) simulation of the ERAS Station (V-ERAS). The initial V-ERAS setup has been based on the following key elements: - ERAS Station simulation using an appropriate game engine supporting a virtual reality headset - Full body tracking - Integration of an omnidirectional treadmill - Support for the crew members’ health monitoring - Multiplayer Support Since the beginning, Python has been one of the key technologies that allowed us to develop the system, also thanks to a team of international students that participated with the IMS to Google Summer of Code. In December 2014, four virtual astronauts have conducted a sustained program of immersive virtual reality simulations during the first week-long V-ERAS Mission (V-ERAS-14). The presentation will report on the outcomes of the V-ERAS-14 Mission while fusing on the transversal role played by the Python language in supporting the project infrastructure.

  • Social Event: PyFiorentina

    Saturday, 18 April 2015 - 21:00

Back to the Top

Day 3: Sunday, 19th April 2015

  • Registration

    Sunday, 19 April 2015 - 08:00

  • Fabio Pliger

    Speaker: Fabio Pliger

    Sunday, 19 April 2015 - 09:00

    View Details

    The IoT (r)evolution, present and future Keynote

  • Gianfranco Durin

    Speaker: Gianfranco Durin

    Sunday, 19 April 2015 - 10:00

    View Details

    In molti sistemi fisici, è importante riuscire ad individuare degli eventi analizzando una sequenza di immagini. Un esempio tra i tanti è quello di analizzare la dinamica di magnetizzazione un film sottile magnetico misurato con sistemi magneto-ottici. Le sequenze che si misurano contengono immagini solitamente molto rumorose dove è difficile riconoscere gli oggetti (nel nostro caso zone corrispondenti a inversione della magnetizzazione) perché i loro contorni non sono ben definiti. Non valgono quindi le tecniche standard per il riconoscimento dei bordi, come la edge detection etc. In questo talk si propone una tecnica alternativa molto semplice ma potente che consiste nell'analizzare la sequenza del colore (o della scala di grigi) di un singolo pixel per poi ricostruire la dinamica dell'intera sequenza. Questo metodo è anche un istruttivo esempio dell'uso del calcolo parallelo. Non sono necessari particolare prerequisiti per la comprensione del talk.

  • Light Refreshments

    Sunday, 19 April 2015 - 10:30

  • Danilo Maurizio

    Speaker: Danilo Maurizio

    Sunday, 19 April 2015 - 11:15

    View Details

    As statisticians with a mac (aka data scientists :-) ) more and more often we turn around the central question: how to balance and mix the best of R and python? We slightly and slowly moved all of our data management towards python (etl and data movement) while being tied to R for statistical learning. We found to be more productive managing machine learning algorithms like random forest on scikit-learn, but for time series forecast or statistical matching (propensity or MIB methods) we rely on R and its libraries. One thing that we do now permanently in python is deploy to production (web serving/service). Anyway first thing, approaching each new project, is the selection of the whole stack: "when and why" use R or python. Analyzing [stackoverflow.com] threads we understood that we are in good company.

  • Lightning Talks

    Sunday, 19 April 2015 - 12:15

  • Lunch

    Sunday, 19 April 2015 - 13:00

  • Sprint

    Sunday, 19 April 2015 - 14:30

Back to the Top