Jfjelstul Worldcup Data-csv Appearances [better] -
In the ecosystem of sports data science, few repositories are as meticulously maintained or as democratically accessible as Joshua Fjelstul’s jfjelstul/worldcup database. While the goals.csv file gets the glory and the matches.csv file provides the narrative spine, there is one table that captures the raw, human cost of the World Cup: appearances.csv .
Using the appearances table, you must calculate time_played = (substitute_out - substitute_in) for each row. For players who played the full 90 (or 120), the logic is different. jfjelstul worldcup data-csv appearances
# Pseudocode for Python (Pandas) avg_sub_time = df[df['substitute_out'].notnull()].groupby('year')['substitute_out'].mean() In the 1980s, the average sub happened in the 75th minute. By 2022, it’s the 58th minute. This table empirically proves the tactical revolution: managers now treat the bench as a weapon, not a lifeboat. 4. The Anomaly Detection: Own Goals and Disciplinary Records Because appearances.csv includes own_goals and red_cards at the player-match level, you can ask bizarre, wonderful questions. In the ecosystem of sports data science, few
SELECT player_name, team, SUM(minutes_played) as total_minutes FROM appearances WHERE tournament = '2022' GROUP BY player_id ORDER BY total_minutes DESC Goalkeepers and center-backs from finalists dominate. In 2022, Emiliano Martínez (Argentina) or Hugo Lloris (France) would top the list with ~690+ minutes. But the real magic is historical: In 2014, Manuel Neuer played every single minute of Germany’s run, including the final. 3. The Tactical Insight: Substitution Dynamics Over Time The substitute_in and substitute_out columns allow you to map the evolution of tactics. Before 1970, substitutions were practically non-existent (injury only). By 2022, five substitutions were allowed. For players who played the full 90 (or