Active3 years, 3 months ago
I am currently reading a file and storing the data in an array named
@lines
. Then, I loop through this array using a for
loop and inside the loop I match on certain values:Feb 06, 2014 Parsing HTML with Perl Efficiently manipulate documents on the Web. In one case I had to process large sets of HTML files I had to process where each file was about 8 Mb. Robust, and maintainable Perl code, you’ll want to check out Damien Conway’s.
At the moment, I am using a scalar,
$find
, with a value of fever
instead of performing the repetitive statements for each filter.Can I pass an array for comparison instead of a scalar keyword?
Peter Mortensen14.4k1919 gold badges8888 silver badges117117 bronze badges
alexalex
4 Answers
If you read a file into a list it will take everything at once
Contrast this with reading into a scalar context
Reading the whole file at once can be a problem, but you get the idea.
Then you can use
grep
to filter your array.Then you can get the count of filtered lines using
scalar
, which forces scalar (that is, single value) context on the interpretation of the array, in this case returning a count.Putting it all together...
Ed Guiness![Perl Perl](/uploads/1/2/6/0/126011173/791300288.png)
29.7k1616 gold badges9191 silver badges137137 bronze badges
If you have Perl 5.10 or later, you can use smart matching (
Eugene YarmashEugene Yarmash~~
) : 91.1k2424 gold badges199199 silver badges280280 bronze badges
Use Tie::File. It loads the file into an array, which you can manipulate using array operations. When you untie the file, its components will be saved back in the file.
Peter Mortensen14.4k1919 gold badges8888 silver badges117117 bronze badges
GeoGeo49.4k9696 gold badges294294 silver badges473473 bronze badges
You could also use the
File::Slurp
module, which is convenient.If you're new to Perl, take a look at the
map
and grep
operators, which are handy for processing lists.Also, take a look at the
LumiLumiack
utility, which is a great replacement for find
/grep
. (Actually, it's a superior alternative.)10.5k66 gold badges4545 silver badges7878 bronze badges
Not the answer you're looking for? Browse other questions tagged arraysperlfile-io or ask your own question.
Active2 years, 2 months ago
I'm trying to open an .html file as one big long string. This is what I've got:
which results in:
<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01 Transitional//EN
However, I want the result to look like:
<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01 Transitional//EN'
'http://www.w3.org/TR/html4/loose.dtd'>
<html>
<head>
<meta http-equiv='Content-Type'>
This way I can search the entire document more easily.
17 Answers
Add:
before reading from the file handle. See How can I read in an entire file all at once?, or
See Variables related to filehandles in
perldoc perlvar
and perldoc -f local
.Incidentally, if you can put your script on the server, you can have all the modules you want. See How do I keep my own module/library directory?.
In addition, Path::Class::File allows you to slurp and spew.
Path::Tiny gives even more convenience methods such as
Peter Mortensenslurp
, slurp_raw
, slurp_utf8
as well as their spew
counterparts.14.4k1919 gold badges8888 silver badges117117 bronze badges
Sinan ÜnürSinan Ünür109k1515 gold badges178178 silver badges314314 bronze badges
I would do it like this:
Note the use of the three-argument version of open. It is much safer than the old two- (or one-) argument versions. Also note the use of a lexical filehandle. Lexical filehandles are nicer than the old bareword variants, for many reasons. We are taking advantage of one of them here: they close when they go out of scope.
Peter Mortensen14.4k1919 gold badges8888 silver badges117117 bronze badges
Chas. OwensChas. Owens53.4k1616 gold badges119119 silver badges212212 bronze badges
Peter Mortensen14.4k1919 gold badges8888 silver badges117117 bronze badges
QuentinQuentin684k7979 gold badges928928 silver badges10831083 bronze badges
All the posts are slightly non-idiomatic. The idiom is:
Mostly, there is no need to set $/ to
jrockwayjrockwayundef
.Read Data From Html File In Perlmutter
36.8k77 gold badges5757 silver badges8585 bronze badges
From perlfaq5: How can I read in an entire file all at once?:
You can use the File::Slurp module to do it in one step.
The customary Perl approach for processing all the lines in a file is to do so one line at a time:
This is tremendously more efficient than reading the entire file into memory as an array of lines and then processing it one element at a time, which is often--if not almost always--the wrong approach. Whenever you see someone do this:
you should think long and hard about why you need everything loaded at once. It's just not a scalable solution. You might also find it more fun to use the standard Tie::File module, or the DB_File module's $DB_RECNO bindings, which allow you to tie an array to a file so that accessing an element the array actually accesses the corresponding line in the file.
You can read the entire filehandle contents into a scalar.
That temporarily undefs your record separator, and will automatically close the file at block exit. If the file is already open, just use this:
For ordinary files you can also use the read function.
The third argument tests the byte size of the data on the INPUT filehandle and reads that many bytes into the buffer $var.
brian d foybrian d foy104k3030 gold badges177177 silver badges484484 bronze badges
A simple way is:
Another way is to change the input record separator '$/'. You can do it locally in a bare block to avoid changing the global record separator.
Peter Mortensen14.4k1919 gold badges8888 silver badges117117 bronze badges
user100177
Either set
$/
to undef
(see jrockway's answer) or just concatenate all the file's lines:It's recommended to use scalars for filehandles on any Perl version that supports it.
Eugene Yarmash91.1k2424 gold badges199199 silver badges280280 bronze badges
kixxkixx
You're only getting the first line from the diamond operator
<FILE>
because you're evaluating it in scalar context:In list/array context, the diamond operator will return all the lines of the file.
NathanNathan3,10311 gold badge2121 silver badges3030 bronze badges
I would do it in the simplest way, so anyone can understand what happens, even if there are smarter ways:
<f>
- returns an array of lines from our file (if $/
has the default value 'n'
) and then join '
will stick this array into.This is more of a suggestion on how NOT to do it. I've just had a bad time finding a bug in a rather big Perl application. Most of the modules had its own configuration files. To read the configuration files as-a-whole, I found this single line of Perl somewhere on the Internet:
It reassigns the line separator as explained before. But it also reassigns the STDIN.
This had at least one side effect that cost me hours to find: It does not close the implicit file handle properly (since it does not call
close
at all).For example, doing that:
results in:
The strange thing is, that the line counter
$.
is increased for every file by one. It's not reset, and it does not contain the number of lines. And it is not reset to zero when opening another file until at least one line is read. In my case, I was doing something like this:Because of this problem, the condition was false because the line counter was not reset properly. I don't know if this is a bug or simply wrong code... Also calling
close;
oder close STDIN;
does not help.I replaced this unreadable code by using open, string concatenation and close. However, the solution posted by Brad Gilbert also works since it uses an explicit file handle instead.
The three lines at the beginning can be replaced by:
which properly closes the file handle.
Use
before
$document = <FILE>;
. $/
is the input record separator, which is a newline by default. By redefining it to undef
, you are saying there is no field separator. This is called 'slurp' mode.Other solutions like
undef $/
and local $/
(but not my $/
) redeclare $/ and thus produce the same effect.I don't know if it's good practice, but I used to use this:
These are all good answers. BUT if you're feeling lazy, and the file isn't that big, and security is not an issue (you know you don't have a tainted filename), then you can shell out: